An Interfacial Thermodynamics Model for Protein Stability

Proteins are important macromolecules that exhibit thermodynamic and kinetic properties that are highly tuned to facilitate biological function within limited ranges of environmental conditions. Despite having a wealth of understanding of the interactions that affect protein stability [Dill, 1990; Pace, et al. 2004], such as the hydrophobic effect, hydrogen bonding, packing, solvation and electrostatic effects: Predicting thermodynamic properties of proteins is difficult because these interactions simultaneously work together within the molecular structure comprising of heterogeneous microenvironments that change dynamically as the conformational state of the protein changes. Consequently, a protein is truly a complex system [Bar-Yam, 1997] where thermodynamic and other emergent physical properties are sensitive to small perturbations in protein structure or its environment. It remains an open problem to develop models that can accurately predict protein stability, ligand-protein binding affinities and allosteric response, all of which are critical to the function of a protein [Petsko & Ringe, 2004; Klepeis, et al. 2004; Bray & Duke, 2004].


Introduction
Proteins are important macromolecules that exhibit thermodynamic and kinetic properties that are highly tuned to facilitate biological function within limited ranges of environmental conditions. Despite having a wealth of understanding of the interactions that affect protein stability [Dill, 1990;Pace, et al. 2004], such as the hydrophobic effect, hydrogen bonding, packing, solvation and electrostatic effects: Predicting thermodynamic properties of proteins is difficult because these interactions simultaneously work together within the molecular structure comprising of heterogeneous microenvironments that change dynamically as the conformational state of the protein changes. Consequently, a protein is truly a complex system [Bar-Yam, 1997] where thermodynamic and other emergent physical properties are sensitive to small perturbations in protein structure or its environment. It remains an open problem to develop models that can accurately predict protein stability, ligand-protein binding affinities and allosteric response, all of which are critical to the function of a protein [Petsko & Ringe, 2004;Klepeis, et al. 2004;Bray & Duke, 2004].

Available computational approaches
A rigorous brute force method that can in principle computationally predict thermodynamic properties of proteins is through all-atom molecular dynamics (MD) simulation in explicit solvent [Lindorff-Larsen, et al. 2009]. In this approach, the equations of motions for all atoms must be integrated over femtosecond time-steps out to timescales extending to hours or more. Then the emergent properties governing how a protein functions must be extracted from the massive amount of atomic coordinates contained in the MD trajectory. Invoking the ergodic hypothesis to determine thermodynamic averages of physical quantities by time averaging over many such long trajectories makes this approach painstakingly slow, especially when one would like to scan over large numbers of possible what-if scenarios for the purpose of finding conditions that yield a desired "engineered" response, such as identifying specific mutations, ligands or solvent composition. Unfortunately, it is not yet possible to reduce statistical errors by employing MD simulations to a point where they are negligible, especially in the context of high-throughput applications.
Despite practical limitations in sampling, MD has proven indispensible for gaining insight into protein function over limited timescales [Gunsteren, et al. 2006]. Furthermore, a coarsewww.intechopen.com grained MD approach with implicit solvent is faster, and multiscale modelling can be fast while achieving good accuracy [Nielsen, et al. 2010]. While multiscale MD is promising, substantial computer resources are required that preclude high-throughput applications, including studies that systematically vary temperature, pressure, pH, and concentration of co-solvents. Therefore, it is not yet feasible in engineering design applications to calculate free energy and other thermodynamic properties of proteins from MD simulations.
There are alternative approaches that represent the fold of a protein in terms of an Ising-like model involving discrete "spin" variables assigned to residues [Hilser & Freire, 1996;Muñoz 2001;Bakk & Hoye 2002;Zamparo & Pelizzola, 2006]. By discretizing macrostates, conformational ensembles that include native, unfolded and partially unfolded states can be generated efficiently [Jacobs, 2010]. Moreover, calculating thermodynamic properties is feasible as a result of the approximations used to reduce degrees of freedom (DOF). One such critical approximation common to Ising-like models is that the three-dimensional native structure is used as a template. Spin variables decorate the template to partition the protein at the residue level into native-like (spin up) and disordered (spin down) regions. For N residues, the 2 N possible spin configurations retain only topological significance because geometrical information beyond the native state structure is not considered. The models are simple enough that for practical purposes exact thermodynamic properties (of the model) can be calculated. However, because non-native interactions are not accessible, the ensemble of conformational states generated (expressed by microscopic "spin" configurations) is not complete physically. Consequently, Ising-like model predictions can be made rapidly, they are precise, but accuracy becomes questionable because the models tend to be oversimplified.

Identifying a fundamental problem
Most Ising-like models that describe protein stability are based on the concept of free energy decomposition (FED), where the free energy of a system is partitioned into parts by assigning enthalpy and entropy contributions to subsystems. Assuming transferability, a ledger is created to account for the gain or loss of enthalpy and entropy relative to a reference state. The naïve method is to sum over all enthalpy and entropy contributions to arrive at the total free energy of the protein in a specific macrostate. This approach is extremely fast, and it is accurate when all subsystems are essentially independent of one another. Unfortunately, errors occur when DOF within subsystems couple. Since cooperative behaviour is typically found in proteins, the assumption of additivity generally fails [Mark & van Gunsteren, 1994;Dill, 1997], causing inaccurate predictions and/or model parameters to be non-transferable.
The application of FED and assumption of additivity is commonly employed to interpret single site mutations and ligand binding affinities. One reason why it is not obvious that the assumption of additivity is flawed is because non-transferability of model parameters also derives from a lack of completeness in modelling interactions, and, Ising-like models are notoriously incomplete. Nevertheless, the fundamental problem is in treating subsystems as independent, which is tantamount to assuming all internal DOF are independent, and this leads to a dramatic overestimate of conformational entropy in the native-state relative to the unfolded state. Fortunately, this problem can be largely overcome by keeping track of the correlations between DOF using concepts of network-rigidity (also called graph-rigidity).

The distance constraint model
The challenge of accurately predicting protein stability lies in developing a model that can account for all essential types of interactions while demanding the model is computationally tractable for high-throughput applications. Toward this goal, I describe a Distance Constraint Model (DCM) that is an Ising-like model that employs a FED, but the assumption of additivity is not used. The total free energy is calculated through the process of Free Energy Reconstitution (FER) to account for coupling of DOF between subsystems. The critical component of the FER is to employ network-rigidity as a long-range mechanical interaction to govern the non-additive nature of conformational entropy.
The DCM has been employed in various forms (differences in model details and methods to solve the DCM) to describe the helix-coil transition [Jacobs, et al. 2003;Lee, et al. 2004;Vorov, et al. 2009;Wood, et al. 2011], the hairpin-coil transition [Jacobs & Fairchild, 2007a] and protein stability [Livesay, et al. 2004;Jacobs & Dallakayan, 2005;Jacobs, et al. 2006a;Vorov, et al. 2011]. Moreover, the DCM predicts substructures within a protein that are rigid or flexible, and identifies sets of atoms that are co-rigid or co-flexible within a correlated motion. Many studies on proteins using a minimal DCM (mDCM) have elucidated stability/flexibility relationships important to function [Livesay & Jacobs, 2006;Livesay, et al. 2008;Mottonen, et al. 2009;Verma, et al. 2010] including the study of allostery [Mottonen, et al. 2010]. The DCM provides a good estimate for conformational entropy in simple loop systems compared to exact calculations ]. This body of work has been reviewed previously [Jacobs, 2006b;Jacobs, et al. 2012]. The success of the DCM across disparate systems indicates that it is well suited for high-throughput applications that include macromolecular design, large-scale comparative analysis and drug discovery.

Advantages of the DCM and its limitations
The DCM offers several advantages over other models for protein stability, listed in order of importance: 1) Network-rigidity is employed to account for the coupling of DOF between subsystems for better estimates of conformational entropy, thereby restoring the utility of a FED. 2) Structural characteristics are linked to thermodynamic properties by associating mechanical constraints to enthalpy-entropy compensation mechanisms. 3) Molecular structure is represented at the all-atom level. 4) The DCM is not restricted to use template structures, although use of templates allows rapid calculation of the partition function. 5) Relationships between flexibility and stability are quantified, which gives insight into the mechanisms of protein function [Luque, et al. 2002]. 6) The DCM can be solved efficiently in multiple ways. 7) The DCM is a general approach that is not restricted to proteins.
Limitations of the DCM include: 1) Calculation of conformational entropy is approximate. Errors are introduced when the geometrical problem is simplified to a topological one (explained below), and because loop corrections are neglected. 2) Long-range electrostatic interactions are not considered. To acknowledge these limitations, the DCM is formulated as an empirical spin-model. 3) Moreover, in the mDCM, mean-field approximations are used to replace many essential enthalpy-entropy compensation mechanisms that are not explicitly modelled, especially in regards to solvation effects. Thus, effective parameters are required to compensate for mDCM inadequacies, suggesting that non-transferability in parameters observed across diverse proteins largely derives from discretionary oversimplifications.

Generalization of the DCM
Despite several studies indicating the mDCM is useful for predicting flexibility and stability of proteins, the merits and limitations of the DCM paradigm remain to be assessed. Many limitations can be substantially reduced by generalizing the form of the DCM to provide a more accurate estimate for conformational entropy without much more computational cost. With the goal of establishing an accurate high-throughput empirical approach, adding terms to model solvation effects offers the least amount of effort for the greatest improvement in accuracy, parameter transferability and retaining rapid calculations. Here, I will redevelop the DCM in the context of interfacial thermodynamics, which has not been done before.
The DCM has four key elements each of which will be discussed separately. First, networkrigidity is explicitly considered as a long-range mechanical interaction to account for the non-additive property of conformational entropy. Second, the FED provides a complete set of elementary subsystems and interaction types that are uniquely identified based on the three dimensional structure of the protein and its macrostate. Third, order parameters are used to define the macrostate of a protein that include the composition of protein-solvent interactions and the number of native-like intramolecular interactions. Forth, applying the FER to each macrostate allows the free energy landscape (FEL) to be calculated.

Linking network-rigidity to conformational entropy
As the amplitude of motion of a flexible molecular structure increases, the conformational entropy will increase accordingly. By ascribing an entropic measure to distance constraints, the DCM posits a quantitative link between network-rigidity and conformational entropy [Jacobs, et al. 2003]. This link requires assigning and characterizing tolerances to constraints.

Draconian view of network-rigidity
A draconian view of network-rigidity in proteins is that some interactions can be modelled by placing a distance constraint between certain pairs of atoms, while the distances between all other pairs of atoms are not fixed. Given a network of distance constraints, the program FIRST (Floppy Inclusion and Rigid Substructure Topography) gives a detailed mechanical analysis of protein structure [Jacobs, et. al. 2001] that includes rigid cluster decomposition (RCD). A RCD defines all rigid substructures where the distance between all pairs of atoms is fixed within a substructure. As such, a protein is modelled as a collection of rigid bodies, where conformational change is through relative motions between rigid substructures.
The RCD depends on the set of distance constraints modelling various types of interactions. Covalent bonds are always modelled as distance constraints, while other interactions may or may not contribute to distance constraints. For example, the hydrogen bond (H-bond) has a wide variation of strength. In FIRST, a dilution analysis is employed to represent a H-bond as 5 distance constraints when its energy is lower than some cut-off energy. By lowering this cut-off, more H-bonds are identified as weak, which do not contribute distance constraints. As such, a protein will undergo a mechanical transition from being mostly a rigid structure with flexible pockets (all H-bonds contribute to distance constraints whether weak or strong) to a globally floppy structure interconnecting many small rigid clusters (only the strongest H-bonds contribute distance constraints). This dilution idea [Jacobs, et al. 1999] was later interpreted as a kinetic mechanism for protein unfolding .
The notion of a rigid substructure is an idealization. For example, FIRST often identifies a long alpha-helix as a rigid substructure, but an alpha-helix actually bends and twists (just like a metal bar can do!). Nevertheless, the RCD is useful to understand long-time scales, where small amplitude conformational deviations in substructures within a protein are neglected, such as the compression, elongation, bending or twisting of an alpha-helix. The rapid calculations for the RCD by FIRST (requiring tiny fractions of a second) has proved to be useful in making comparative studies across protein families, and to elucidate common structural features regarding flexibility important to function Rader, et al. 2004;Fuxreiter, et al. 2005;Costa, et al. 2006;Radestock & Gohlke 2008;Mamonova et al. 2008;Rader, 2010;Heal, et al. 2011;Radestock & Gohlke 2011]. It has also been shown there is a statistically significant correlation between the propagation of rigidity between two mutation sites within a protein to non-additive effects in free energy cycles describing double mutant studies [Istomin, et al. 2008].
If all weak interactions are allowed to contribute to distance constraints, FIRST will predict a completely rigid protein, failing to be of any use. Instead, excluded volume effects due to van der Waals interactions are included in geometrical simulation that allows rigid clusters to wiggle about without violating any distance constraint, or without atoms passing through one another. FRODA (Framework Rigidity Optimized Dynamics Algorithm) is one method [Wells, et al. 2005;Farrell, et al. 2010], among others [Lei, et al. 2004;Thomas, et al. 2007;Jimenez-Roldan, et al. 2011;Yao, et al. 2012] that uses FIRST to identify a native RCD that is preserved during the simulation. FRODA efficiently explores the native state ensemble of conformations [Jacobs, 2010;David & Jacobs 2011]. The main limitation is that the native state structure defines the distance constraints, and once set, they never break. This athermal mechanical description of a protein cannot account for non-native contacts, making largescale conformational change between different conformational states impossible if nativecontacts in either state must be broken along the pathway. However, pathways between conformational states can be achieved by using a common collection of distance constraints between native conformations [Farrell, et al. 2010].

Liberated view of network-rigidity
The draconian view suffers from three awkward problems. There is no prescription for (1) how to determine the proper number of distance constraints to model an interaction, and for (2) when an interaction will contribute distance constraints. Also, (3) the selection threshold that determines whether distance constraints are placed in the network causes artificial discontinuities. A discontinuity is illustrated by two nearly identical H-bonds having a small energy difference slightly above and lower than the cut-off energy. The H-bond with higher energy is modelled as infinitely weak (not present) while the other is infinitely strong! By assigning tolerances to configuration variables, the accessible range for each variable can be quantified. The distinction between a DOF and a constraint is reflected in the process of before and after a tolerance is assigned. That is, a variable acts as a DOF before its tolerance is assigned. Once the range of a variable is restricted, it acts as a generic distance constraint. Applying graph-rigidity algorithms determine if a generic distance constraint is redundant or independent. A redundant distance constraint has zero tolerance because its length is determined by independent constraints. The length of independent distance constraints can vary within their tolerances. After all variables are analysed, a system of N atoms is www.intechopen.com generically rigid (having 0 internal DOF) consisting of 3 6 N  independent distance constraints all with finite tolerances that quantify the accessible geometrical embedding of the constraint topology. In other words, for a given constraint topology there will be an entire ensemble of geometrical realizations that are consistent with the accessible tolerances. The conformational entropy is then related to the logarithm of this geometrical degeneracy.
In this view, the three awkward problems mentioned above are eliminated. A system of N atoms consists of 3 6 N  internal DOF. Similarly, a subsystem with n atoms for 3 n  has 36 n  internal DOF, and for 2 n  has 1 internal DOF. A proper description of a subsystem should only be in terms of independent configuration variables that need to be assigned tolerances. Therefore, the number of contributing constraints for a subsystem of n atoms is just equal to the number of its internal DOF. For example, regarding a H-bond as a 3 atom subsystem (the donor, hydrogen and acceptor) requires 3 internal DOF to specify its configuration. Thus, a H-bond will contribute 3 generic distance constraints whenever it forms, and it will randomly form or break based on a probability that is appropriate for the system to be in thermodynamic equilibrium, which removes arbitrary thresholds.

Illustration: A two dimensional quadrilateral
To illustrate the points discussed above, consider 4 particles confined to a plane as shown in Fig. 1A  The four cases to be considered are: g = 8 kcal/mol or 0, and 2 b  Å or 1.5321 Å. For the harmonic potential where g = 8 kcal/mol; the quadrilateral defines an elastic network. A flat potential ( g = 0) has a constant energy as lengths or angles vary within a tolerance, but an infinite energy outside the tolerance. A sketch of how much the quadrilateral can deform for a flat potential is shown in Fig. 1B. Shaded regions define accessible geometries for each angular range (5 rows) and distinct constraint topology (4 columns) due to the fluctuating The accessible range of motion for each coarse-grained configuration is shown in grey. The middle two columns show how the more limited range of motion along a diagonal interaction compared to the range allowed by the torsion-forces is the determining factor restricting the motion. The symbols (, L, X) indicate which configurations are (very probable, improbable, inaccessible). The index, , is used to label all accessible configurations. In the example considered here, the inaccessible configurations essentially have infinite energy, but in a more refined model they could have a finite energy. (C) Treating interactions as independent, the probability density, ()   , for the length of a distance constraint is plotted for central-, fluctuating-and torsion-interactions, where the lower and upper graphs correspond to the cases: g = 8 kcal/mol and g = 0.
www.intechopen.com interaction along the diagonals. Fluctuating interactions break and form to reflect hidden DOF in the system not modelled. For example, a fluctuating interaction along a diagonal of the quadrilateral could model a H-bond that may appear or not depending on the details of the electronic structure of the donor, hydrogen and acceptor atoms.

Free energy decomposition and reconstitution
The configuration integral, is the potential energy of the system in the -th coarse-grained configuration, where R is the ideal gas constant, T is absolute temperature, and 0.002

 
Å is a length scale factor (in classical statistical mechanics entropy is defined up to an arbitrary constant). Since rigid body DOF are not of interest, the quadrilateral is translated and rotated so that particle 1 is at the origin while particle 2 is along the x-axis, requiring a five dimensional integral. From Q  , the total internal free energy is given as ln( ) FR T Q Jacobian is inserted to convert from an angle to a length measure using an approximation that constrains 12 . Consequently, a slight underestimate of the entropy is incurred because the angle is actually defined without imposing these length constraints. The probability density, ()   , for finding the distance between a pair of particles (the plot is for T=400K) for a particular interaction is shown in Fig. 1C, which is given by the Boltzmann factor normalized by the corresponding partition function.
For each interaction type, x , the thermal energy, x U , and pure entropy, x  , are plotted in Fig. 2A and Fig. 2B respectively for a flat potential energy, and in Fig. 2C and Fig. 2D for a harmonic potential energy. Notice that x  decreases when the peak width in the probability density, ()   , decreases as shown in Fig. 1C. Interestingly, transforming from an angle to a length variable to describe torsion interactions causes k t  to decrease as the angle increases (larger k) because a smaller length variation results from the same angular deviation. This difference is important when network rigidity is used to calculate conformational entropy.
For an additive FER: The free energy for the -th configuration is:  The pure entropy also has no temperature dependence. For a harmonic potential: (C) Thermal energy has temperature dependence according to the equipartition theorem, but this cannot be seen using the energy scale of graph-A. Therefore, only 2 t U is shown on a magnified scale. (D) The pure entropy monotonically increases, but it has a limiting plateau of the flat potential. Note that graph-D uses the same vertical axis as graph-B to assist in direct comparisons.
To proceed further, a more general indexing is needed to label specific interactions. For the quadrilateral network, there are four torsion-force, two fluctuating and four central-force interactions. Let j=1,2,3,4 label the angle variables 1234 ,,,  . Let j=5,6 label the fluctuating interactions between particles (1,3) and (2,4). Let j=7,8,9,10 label the central-force interactions between particles (1,2), (2,3), (3,4) and (4,1). The j index identifies a particular interaction within a configuration that is labelled by the  index. In application to proteins, the  index labels a particular accessible coarse-grained geometry that a subsystem can explore. Thus, the index-pair, j  , is a general indexing scheme used to uniquely label energy and entropy contributions. In this example, each j  maps to the x index, where 12345 ,,,,,, For a non-additive FER: The free energy of configuration,  , is given as: where cnf U  is the lowest possible energy of the selected basin, and vib U  describes the energy associated with vibrations within the basin. Furthermore, the conformational entropy, R   , is associated with any continuous deformation that is able to take place within a basin over a constant (or nearly constant) energy surface. The three functions are explicitly given as: where the non-additive contributions can be identified by the terms containing the j q  variables. The j E  define reference energies, . Quantities for individual interactions, such as are to be worked out in advance and stored in lookup tables to define the FED. The variable j n  can equal (1 or 0) when the j-th interaction is (present or not present) in the -th configuration. For quenched interactions, The variable j q  can equal (1 or 0) when the j-th interaction is represented by (an independent or redundant) distance constraint. It is important to notice that the assignment of which distance constraints are independent or redundant is not unique. Therefore,   will also not be unique! However, because distance constraints with smaller tolerances restrict motion more than those with greater tolerances (see Fig. 1B and view the middle two columns), the lowest value that can be obtained for   yields the best estimate for the net conformational entropy in the basin. Therefore, j q  are determined by augmenting a preferential rule to the graph-rigidity algorithm that manifest as building the network by placing one distance constraint at a time in the order defined by the sorted set of j   from smallest to largest.
As can be seen from Fig. 2B and Fig. 2D, the rank ordering of pure entropies from smallest to largest values when considering all interaction types does not depend on temperature. The same rank ordering is obtained whether flat or harmonic potentials are considered. The values for j n  , j q  and corresponding jx   labels are listed in Table 1 to enable explicit hand calculation of F  . Note that an additive FER is obtained by setting all j q  = 1, so that all interactions are considered independent, despite being inconsistent with network rigidity. Regardless of the FER employed, the free energy of the system with n fluctuating interactions present is given by: The () Fn are calculated in three different ways: (1) Exact answers are obtained by numerically performing the 5 dimensional configuration integral for each Q  , and approximate answers are obtained by employing an (2) additive and (3) non-additive FER. In Fig. 3, () Fn are plotted for the four cases g = 8 or 0 kcal/mol, and 2 b  or 1.5321 Å. Table 2 summarizes the relative errors for the additive and nonadditive FER predictions for the thermal energy and entropy of the system separately.
Comparing to the exact answers using Fig. 3 and Table 2, it is seen that the predictions of the non-additive FER are good for the harmonic and flat potential energy cases when b= 2 a. This is because the geometry of the fluctuating-interaction is commensurate with the torsioninteraction. Conversely, when b=1.5321a, the predictions breakdown because geometrical frustration between these two interactions cause large strain energy in the network. The additive and non-additive FER procedures are now juxtaposed. The results for this simple quadrilateral network example highlight the key concepts that are applied to proteins. Table 1. The variables j n  , j q  and corresponding x -indices are specified for all allowed configurations (for =1-8) and interactions (for j=1-10). The j  pair-index for the j n  and j q  variables is suppressed. Note that j q  can be determined by first placing the four x=c type of interactions in the network. Once the very next constraint is added, it will reduce one more DOF to make the network rigid. Therefore, the total number of independent distance constraints is 5 for all configurations, which can be checked by inspection.  Table 2. The relative percent errors (denoted as %e) are given for energies (denoted as E-) and entropies (denoted as S-) for the additive FER (denoted as a-) and the non-additive FER prediction (denoted as p-) for when the system has 0, 1, 2 fluctuating interactions present (denoted as # fip) for g  8 kcal/mol and 0 on the top and bottom three rows respectively. Also the ( 2a b  and 1.5321 ba  ) cases are shown on the (left and right) sides of the table. These percent errors apply to T=300K, and reflect typical values at all other temperatures.
The additive approach is completely wrong for predicting conformational entropy because it predicts conformational entropy of a network increases as more constraining interactions appear. Thus: The conformational entropy of a protein in its native state will always be predicted to be greater than that of the unfolded protein when using an additive model. Although energy estimates from the additive FED are good, there are two sources of errors that occur for harmonic potentials (not for flat potentials). Part of this discrepancy is directly caused by over counting thermal energy contributions (i.e. 2 1 RT ) for all quadratic DOF, rather than just the independent ones. This problem is more severe with respect to entropy estimates, and these errors cause relative statistical weighting of the various configurations in the ensemble to error, thereby indirectly leading to errors in average energies.
A non-additive FED naturally models energy-entropy compensation mechanisms that link atomic structure characteristics to thermodynamic response. As interactions form within a protein, such as from H-bonds or packing, more constraints are placed on conformational motions, leading to a decrease in conformational flexibility and entropy. In particular, if an energetically favourable interaction forms in an otherwise flexible region, it will constrain motions and decrease conformational entropy. However, if the interaction forms within an otherwise rigid region, no entropic cost is incurred because the constraint that is imposed is redundant. This effect is the underlying source of non-additivity. Thus, network rigidity is an interaction between entropic contributions that provides a simple mechanism for groups of interactions that constrain conformational flexibility to form and break cooperatively.
Errors appear in the non-additive FER primarily due to three reasons: (1) The Jacobian that should be part of the reconstitution of free energy components to account for how subsystems interface geometrically is ignored by assuming independent constraints are orthogonal. For example, the very skewed configurations shown in Fig. 1B  (3) Information is lost by coarse-graining structure into local configurations (identified by the  index). For example, when 1.5321 ba  the flawed prediction in energy (~ 10% relative error as listed in Table 2) occurs because when just one fluctuating-interaction is present it can stretch between multiple configurations. In addition, the building blocks are not commensurate, leading to strain energy. When both fluctuating interactions are present in the system, only the problem of strain energy remains.
Partitioning local structure into finer coarse-grained bins to define accessible configurations with more restricted range of motion systematically diminishes all three sources of error. A high degree of accuracy can be obtained by finely coarse-graining the energy landscape (such as a harmonic well) by a large number of flat energy tiers differing by tiny increments (say 0.01 kcal/mol). As the quantities   , ,, xx x x QFU  are further refined for each interaction type, x , this will create more binning labels that comprise the FED. Although defeating the objective of rapidly calculating absolute free energies, it is important to note that errors can be reduced to low levels in principle. Also important is that the computational cost for the non-additive FER scales linearly with respect to the number of atoms in the network, N , with a pre-factor that is proportional to the number of coarse-grained bins used. Since exact integration scales as 3N   , consideration of a sophisticated FED may be worth the effort.
The errors caused by large strain energies in frustrated configurations can be identified and removed from the ensemble in applications to proteins based on the empirical justification that proteins exhibit folding funnels because they are minimally frustrated [Onuchic & Wolynes, 2004]. In practice, this is accomplished by considering only native contacts with respect to a specified template structure that is obtained experimentally (say from X-ray crystallography) or from a model structure that is fully relaxed. This leads to a FED scheme that classifies protein structure in terms of a finite number of local energy basins such as accessible backbone conformations within a Ramachandran plot and sidechain rotamers for residues. Moreover, a variety of different types of H-bonds can be classified. The complete classification of local structure defines the set of all possible subsystems that can appear within a protein. Then, the minimum of the potential energy of a basin is used to obtain the conformational part of the free energy, with the free energy contributions from modes of vibration augmented. Consequently, the x U and x  parameters for the various basins are temperature independent. Notice that for the quadrilateral example, the temperature dependence in x U and x  appears because of the harmonic potential energy, as seen in Fig.   2C and 2D. Thus, the free energy of a subsystem is separated into conformational and vibrational parts, such that ( ) Only when x RT    does the equipartition theorem apply. More generally, ( ) vib xx F  is empirically modelled as the free energy of a quantum harmonic oscillator with natural frequency, x  . An observation that can be seen by comparing the harmonic and flat potential energy example cases is that the free energy contributions from vibration originate only from independent modes. A subsystem (in three dimensions) with n atoms for 3 n  has 3 6 n  independent modes of vibration, and one independent mode when 2 n  . At this point, an empirical interfacial thermodynamic model can be developed. www.intechopen.com

Interfacial thermodynamics model for protein stability
The previous section showed how to reconstitute conformational free energy for a given constraint topology or framework using network rigidity. The staring information is that all energy and entropy components for accessible subsystems are stored in lookup tables. The focus was on internal DOF of the system. Now the affect of solvent on a protein will be included. Solvent DOF are external to the protein and need not be kept track of because they are part of a reservoir. Therefore, treating all free energy components from a reservoir as additive is consistent with a thermodynamic hypothesis. The problem that is at hand now is to determine the partition function of a protein, which takes the generic form: The index, , defines an accessible configuration that is generated from a template structure decorated with Ising-like spin variables to specify the local environment of each residue. Through coupling, the template decoration helps define a mechanical framework, which is specified by energy basins, labelled by , and its member distance constraints labelled by j, across all subsystems. The placement of distance constraints is specified by the j n  values. Topological information about the mechanical framework that is contained in  serves as input to a graph-rigidity analysis that yields the j q  values. Because the number of distance constraints and modes of vibration are equal within a subsystem, the j index is reused to label modes 1 . In Eq. (3) involving random variables, it is understood that whenever j n  =0, so does j q  =0, since if a distance constraint is not present, it cannot be independent. The j q  in the last term are necessary because the free energy of vibration is reconstituted by adding only independent modes of vibration. The term, slv Q  , takes on a form similar to many FED schemes commonly employed in the literature that relate transfer free energies to estimate changes in free energy of residues and other designated chemical groups based on whether they are exposed or buried in the protein through solvent exposed surface area.

Free Energy Decomposition (FED)
The FED accounts for enthalpy 2 and entropy contributions from solvent, conformation and vibration. The geometry of a protein is defined by one or more template structure(s). Given any template structure, all its atoms are partitioned into contiguous groups of atoms that are classified and parameterized by the FED. As such, each atom within a system must map to one and only one molecular constituent, which also serves as a primary subsystem. Molecular constituents in proteins define the residues, as illustrated in Fig. 4.
The FED considered here consists of 1) residues, 2) covalent bonds that link residue pairs, 3) H-bonds and 4) hydration interactions that together constitute a constraint network, and, the additive contributions consist of 5) residue solvation and 6) hydrophobic interactions that together model solvent effects. To account for protonation states on titratable residues, an additional solvent dependent partition function, ion Q  , must be inserted in Eq. (3), but this is not discussed here because it introduces technical complexity without offering anything more conceptually. Packing interactions are implicitly included. Because the FED can divide a system up in different ways, and because of the empirical assignments, different effects can be lumped together in various terms. Here packing effects are included in the residue states that are identified as native-like or disordered, corresponding to good or poor atomic packing with respect to the strain free template. With exception of long-range electrostatic interactions, and electing to work with a fixed protonation state, the six listed types of contributions encompass all essential enthalpy-entropy compensation mechanisms.  4. A schematic of a protein template structure is shown consisting of 16 residues. Each residue defines a subsystem that includes all atoms from its backbone and sidechain. The model can include a sufficient number of probable low energy basins for each residue type covering the most frequently occurring conformations identified in the Ramachandran plot and the sidechain rotamers. The degree of coverage depends on the level of coarse graining, which ultimately controls the accuracy of the model and speed of the calculations.

The free energy functional
Solving Eq. (3) poses insurmountable challenges. Therefore, Eq. (3) is reformulated as a free energy functional (FEF) that can be efficiently solved numerically using self-consistent mean field theory. The FEF in generic form is written in a format that is germane to the concept of a FED, where various types of contributions can be identified. for conformational constraints that are externally imposed on the protein structure due to solvent molecules ---often described as forming a clathrate-structure, and etc cnf G indicates that the model can be extended if needed, such as modelling packing interactions explicitly.
The FEF is expressed in terms of a set of a priori unknown probability functions describing the microstates of the protein. The exact nature of what the microstates are will depend on the FED. In addition to the various FED terms that make up the FEF, order parameters are employed to define the macrostate of a protein that reflect sub-ensembles of microstates. By minimizing the FEF under the global constraints imposed by the order parameters, a free energy landscape (FEL) is calculated. The first step is to define the FED based on microscopic mechanisms deemed important to model, which naturally leads to defining variables and their associated probability functions. The second step is to define the order parameters that will be used to define the FEL. The third step is to solve the FEF. How to solve the FEF will be explained below in a specific context of the FED. The task at hand now is to define the FED in terms of enthalpy-entropy compensation mechanisms essential to protein stability.

Solvent related enthalpy-entropy compensation mechanisms
Residue solvation: A residue can be buried (b) in the core of a protein without solvent contact, or it can be exposed to solvent. When exposed, the solvent molecules surrounding the residue might be mobile (m) or structured clathrate (c). Each residue is assigned a solvation state, s , to characterize its local environment, where {, ,} sb m c  . Residue solvation together with the given template structure is used to specify a microstate of the protein. The ensemble of all accessible solvent states for a given template with n residues consists of 3 n configurations.
The solvent state decorates the template structure. For example, Fig. 5 illustrates a decoration of the template structure shown in Fig. 4 (5) is the standard expression for the free energy of a system comprised of independent subsystems, where the mixing entropy is accounted for in the last term. At this point, each residue is able to independently explore three solvation states in thermodynamic equilibrium. However, as more terms are added to the FEF, these states will become coupled in the same way spin-spin coupling occurs in Ising or Potts models. In the FED considered here, the set of functions { rs p } will form the basis for completely representing the FEF as an Ising-like model with generalized spin-spin coupling terms. The coupling terms that are described next account for interactions at the interface between subsystems, which are the molecular constituents defined by the residues. Fig. 5. A schematic of a protein template that is decorated by specifying one of three possible solvation states for each residue. Buried means that some part of a residue is not in contact with solvent. The contributions of free energy, enthalpy and entropy in the buried state are proportional to the solvent accessible surface area of the residue. Exposed solvation states have maximum solvent accessible surface area. The difference in these quantities between buried and exposed states will be less for corner residues {4,7,10,13} compared to the other surface residues {3,5,6,8,9,11,12,14}. A maximum difference occurs for core residues {1,2,15,16}, which can become exposed to solvent due to solvent penetration. Recall the template defines a fixed topology, but not a fixed geometry. This type of coarse-grained description of solvation is common to Ising-like models.
Hydrophobic interaction: The change in free energy to transfer water from an interface that separates neighboring molecular constituents into bulk solvent is how the hydrophobic effect is modeled in the FEF. This interface term is illustrated in Fig. 6A is an estimate for the number of water molecules that could reside at the interfacial surface between residues 1 r and 2 r based on the geometry of the template structure. Note that 12 , wat rr n is the interfacial surface area divided by the specific area covered by a single water molecule. The parameters, (, ) hph hph , represent the (energy, entropy) contributions to the free energy for transferring one water molecule from a generic non-polar reference environment to bulk solvent.
An interesting property of Eq. (6) is that the accumulated strength of the hydrophobic effect is proportional to the total surface area of the buried-buried interfaces that snake through a protein. The nature of these interfaces depends on the solvation state of the protein. Also a significant part of the overall strength of the hydrophobic effect is due to the chemical nature of the bulk solvent (i.e. affecting chemical potential), which is reflected in the parameters (, ) hph hph  by expressing them as functions of co-solvent concentrations. In aqueous solution a thermodynamic force is generated to expel water from the core of a globular protein, thereby resisting water penetration. The hydrophobic effect competes against the desire for residues of all types (hydrophobic or polar) to be solvated. The nuanced details of the solvation properties of each residue type combined with where residues are located in the template structure determines the amount of "dry" or "wet" interfacial surface area, and this directly relates to water penetration pathways associated with partial unfolding events. . Triangles represent residues and indicate a relative orientation. (C) Neighboring pair of residues in buried and exposed-mobile (green block) states is shown. When mobile solvent surrounds an exposed residue; a fluctuating intramolecular H-bond (red line) can form between it and a buried residue, or a fluctuating solvent-protein H-bond can form (short red arrow) between the buried residue and solvent.
(D) An exposed residue surrounded by clathrate water (blue block) prevents a H-bond to form between it and a neighbouring buried residue because the immobile water molecules cannot properly rearrange, and thereby shields the residue from intramolecular H-bonding.

www.intechopen.com
Solvent-protein H-bond: H-bonds appear in the FEF in both the solvation and conformational parts of the FED. When a H-bond forms between two neighboring residues in a template structure, the model must account for nine cases corresponding to each of the residues being in one of its three possible solvation states. First, the intramolecular H-bond will be present when both residues are buried, making it impossible for another H-bond to form between solvent and to that particular buried region of the protein. If both residues are exposed, the question about solvent-protein H-bonding to these residues is irrelevant because the residue solvation free energy fully accounts for theses interactions. The cases that generate a solventprotein H-bond are when one residue is exposed to solvent, while its neighboring residue is buried. The discontinuity in local environment creates a surface term at a wet-dry interface between the two subsystems. In particular, the intramolecular H-bond can remain in tact, or be replaced by a solvent-protein H-bond that forms between solvent and the buried residue.
In all, Fig. 6B, 6C and 6D summarizes 7 cases. The three cases that involve intramolecular Hbonding will be addressed below when considering the conformational part of the FED. However, because two cases are coupled dealing with fluctuations between solvent-protein and intramolecular H-bonds (see Fig. 6C), another probability function must be introduced to determine if a solvent-protein H-bond or an intramolecular H-bond will form when both options are accessible. Let ihb h p be the probability that the h-th intramolecular H-bond (ihb) identified in the template structure is present. Although it is not difficult to associate the hindex of an identified intramolecular H-bond to its residues that provide donor and acceptor atoms, it does require cumbersome notation to explicitly show this correspondence. Therefore, in formulas that involve the h-index and two residue indices ( 1 r and 2 r ) this correspondence is implied. Then, as another interface term in the FEF, shb slv G is given by: In Eq. (7) the parameters ( , ) shb shb characterize the energy and entropy contributions from solvent-protein H-bonds. These parameters should be dependent on local structural details of the solvent and protein, but the model is based on an implicit solvent. Including myriad structural details would be intractable, but as effective parameters, they depend only on the chemical nature of the bulk solvent. Thus, all that matters is the total number of H-bonds between solvent and the protein, given by shb N . This average is over the ensemble of microstates that represent all fluctuations taking place between intramolecular H-bonds and solvent-protein H-bonds. Therefore, the second term is the mixing entropy associated with intramolecular H-bonds forming and breaking. The average number of H-bonds in Eq. (8) is expressed as a simple sum over the probability of finding a solvent-protein H-bond, which is then expanded out in detail corresponding to the four terms shown in Fig. 6C.

Entropy spectrum for molecular constituents and interfacial subsystems
As outlined in section 2, subsystems are coarse-grained into configurations corresponding to low energy basins, each with the same number of distance constraints but with particular www.intechopen.com characteristics. The number of distance constraints is just enough for the subsystem to be isostatically rigid, meaning no distance constraint is redundant when the subsystem is isolated. Suppose subsystems 1 and 2 with n 1 and n 2 atoms are connected together to form a larger rigid system with (n 1 + n 2 ) atoms through an interaction at their interface involving n atoms from each subsystem. This interfacial interaction is modelled using 3(2n) -6 distance constraints 3 so it too is isostatic when isolated. Within the combined system, there will be 3(n 1 + n 2 ) -6 independent distance constraints. However, the two subsystems together with the interfacial interaction produce 3(n 1 + 2n + n 2 ) -18 constraints, leading to 6n -12 redundant constraints in the combined system. Therefore, interfacial subsystems will generally create redundant constraints as molecular constituents are coupled through interactions. Interestingly, peptide bonds along the backbone that join residues together do not generate redundant constraints. Thus, a random coil (no crosslinks) has no redundant constraints.
A complete entropy assignment to all distance constraints can be made for each basin by performing a local all-atom sampling using a quasi-harmonic approximation. This means that absolute entropies are estimated from the Schlitter entropy formula based on the covariance matrix of atomic fluctuations (Andricioaei & Karplus, 2001) that is obtained from an all-atom MD simulation using an accurate molecular mechanics force field. Afterwards, by considering contributions from all accessible basins, the intrinsic free energy of a subsystem can be reconstituted. This method applies to residues ) and all interfacial subsystems. While it is important that a robust procedure to determine entropy and energy parameters for the conformational part of the FED has been established, for my discussions here it suffices to know that the parameters concretely exist with entropy values on an absolute scale! Furthermore, it is possible to model each mode of vibration within an energy basin of a subsystem using a specific distribution of distance constraints. However, a surprisingly simple description can be made without invoking any details about how the distance constraints are distributed.
For the purpose of explaining the interfacial thermodynamics paradigm, network rigidity will be described in terms of Maxwell constraint counting (Whitely, 2005). Maxwell constraint counting (MCC) is a mean field approximation to graph-rigidity. As shown previously, the application of MCC to solve the mDCM yields results that capture the correct qualitative thermodynamic response in beta-hairpins (Jacobs & Fairchild, 2007a), alpha-helices (Vorov, et al. 2009) and proteins (Vorov, et al. 2011). The advantage of using MCC is that concepts can be calculated easily without obfuscation because all topological details about how the constraints are distributed in the network are ignored. The only specification required is that a subsystem with n atoms has 3n -6 distance constraints, each assigned an entropy value of j  for j=1 to n, such that . This ordered set of values from lowest to greatest defines an entropy spectrum for a subsystem. All of the different types of subsystems that make up the conformational part of the FED are now described.

Conformation related enthalpy-entropy compensation mechanisms
Molecular constituents: Residues are molecular constituents that define subsystems involving a certain number of atoms. Each residue, r, in a specific energy basin, , has its own entropy spectrum. While it is possible to have a detailed description of each residue by using a large number of energy basins, an Ising-like model similar to the mDCM is described here, where each residue is classified as native-like or disordered. A conditional dependence as to whether a residue is native-like (=n) or disordered (=d) is tied to its solvation state. If a residue is exposed to solvent, it is modelled as disordered. However, a buried residue can be nativelike or disordered. Native-like implies the local geometry of a residue will be similar to the template structure, and disordered implies poor atomic packing that is reflected by higher energy and entropy. Respecting conditional dependences on solvation, res cnf G is given by: In Eq. (9) the top term contributes when residue, r, is exposed with probability (1 ) rb p  or when it is buried and disordered with probability, , define the entropy spectrum for residue, r, when it is in the disordered state, d. Once residue, r, couples to other residues within the protein, some of its distance constraints may become redundant. Recall redundant constraints do not contribute to conformational entropy. Therefore, rdj q is the probability that distance constraint, j, in residue, r, and in its disordered state, d, is independent. The probability for residue, r, to be native-like when it is buried, is given as , and rnj q is the probability that distance constraint, j, in residue, r, and in its native-state, n, is independent. The last term involving the square brackets is the mixing entropy for the buried residue, r, to be either native-like or buried.
Taken together, res cnf G supports the following possibilities: A large number of buried residues that are mostly disordered correspond to a collapsed state driven by the hydrophobic effect. As more native-like residues form, but with high variance, the protein will transition to a molten globular state with fluctuating secondary structure. When the majority of buried residues are native-like, the protein will be in its native-state with some degree of flexibility. Thus, all common known phases can be described by the accessible microenvironments.

Covalent bond linkers:
When a covalent bond links two residues, it involves two atoms in each of the two residues. Therefore, the covalent bonds are modelled as an interfacial subsystem containing 4 atoms connected by 6 distance constraints. The parameterization of the distance constraints for a covalent bond with a flexible dihedral angle such as a www.intechopen.com disulphide bond, or a peptide bond with a fixed dihedral angle within a trans or cis basin will have distinct energy and entropy parameters. Since chemical reactions involving the breaking or forming of a covalent bond is not considered here, these interactions are quenched. Moreover, only one energy basin for each type of covalent bond is considered here. Then, lnk cnf G is given by: Here cnf k  is the energy, {} cnf kj  is the entropy spectrum, and, cnf kj q is the probability that the jth distance constraint for the k-th covalent bond in the protein is independent. It is worth noting that for peptide bonds, the entropy spectrum will consist of such low entropy values that a graph-rigidity analysis augmented by the preferential rule of placing lowest entropy distance constraints first, yields cnf kj q = 1 always. In this case, a constant contribution will always come from peptide bonds, rendering its affect on thermodynamic response. As such, the parameters for peptide bonds are unnecessary to specify. In contrast, for a rotatable covalent bond, usually 5 out of 6 distance constraints can be "frozen" out. However, the sixth distance constraint characterizes tolerances in the torsion-angle. Note that even using one energy basin to model a rotatable covalent bond can affect thermodynamic response because depending on other distance constraints in the network, the 6 cnf k q probability need not be 1 always, and thus not frozen out. For more accuracy, more than one basin can be considered of course, which would also provide variation in the energy if the energy basins correspond to frequent and rare angular ranges.

Intramolecular H-bonds:
An intramolecular H-bond (IHB) identified in the template structure between residues 1 r and 2 r (labeled by the h-index) will be present when both residues are buried. When one residue is buried and the other is exposed to mobile solvent there is a probability ihb h p for this IHB to be present, otherwise it will be broken as shown in Fig. 6C.
When both residues are buried as shown in Fig. 6B, there are four cases to consider because each buried residue may be native-like or disordered. Consolidating the four cases into two leads to either both residues are native-like, which defines a native IHB, otherwise a The form of Eq. (11) follows a general pattern that applies to all conformational interactions. That is, there is a probability for the interaction to be present, possibly in a specific state, and under this local condition modelled by a particular basin, it contributes energy that adds to the system. Moreover, the conformational entropy that it contributes is non-additive due to network rigidity, which is used to calculate the probabilities for distance constraints to be independent. In the top term of Eq. (11), the hnj q and hdj q give the probabilities for the h-th IHB in the native and disordered basins respectively. Notice that the hdj q in the lower term is the same as in the top term. When Eq. (11) is considered together with Eq. (7) and Eq. (8), we see the protein structure must balance enthalpy-entropy compensation in the hydrogen bond network of the protein versus forming H-bonds to solvent.

Hydration interaction:
The hydration interaction is introduced to model the affect of aqueous solvent on a residue when it is exposed to a clathrate environment. In this case, the water molecules surrounding a residue form a rigid motif that manifests itself as a mechanical clamp on the residue. Many distinct molecular configurations can lead to a rigid motif. As such, a large number of basins are needed to fully characterise the structure of water molecules around a residue. However, this detailed information is difficult to obtain, and considering the level of coarse-graining that has been made in regards to residue solvation, and to the native-like and disordered classifications of a residue with respect to a template structure, it is appropriate to employ a single basin. Following the general pattern, hyd cnf G is given by: In Eq. (12) the energy term is not included because it is already accounted for in the clathrate solvation energy parameter. The additive part of the solvation entropy parameter accounts for the solvent DOF having a reduction in conformational entropy. But mechanical clamping from the clathrate structure reduces conformational flexibility and entropy of the residue. This clamping is modelled by 3n-6 distance constraints to ensure the residue of n atoms is isostaticly rigid. A degenerate conformational entropy parameter is given by hyd  , and rcj q gives the probability for the j-th distance constraint to be independent, where the c-subscript denotes clathrate. A reduction in conformational entropy at low temperatures is a critical mechanism to understanding cold denaturation, as shown in previous work . Technically, this clamping effect could also be present for an exposed-mobile environment, and using the same modelling scheme but with different parameters would allow for different types of clamping depending on the details of the local water structure. Again, as found previously, breaking up this complicated many body interaction of water molecules on the protein into two simple states (mobile verses clathrate) proves sufficient.
Network rigidity: The non-additive reconstitution of total conformational entropy involving all entropy components is a critical aspect of the FEF. First note that the labelling scheme for component entropies across various interaction types is quite cumbersome, although useful for distinguishing their roles. In regards to network rigidity calculations, it is convenient to define a parallel indexing scheme. Let c  be the pure entropy of distance constraint, c, which runs from 1 to C to account for all accessible interactions. Note that the total number of distance constraints in a protein is not equal to C. Rather, if all identified subsystems were present simultaneously, which is not physically possible, C is the sum of the numbers of distance constraints in all subsystems defined by the template structure. The labelling of all these accessible entropies fall in sorted order such that 1 cc . That is, { c  } will play the role of an entropy spectrum for a template structure, but not all levels can be occupied.
As part of a general pattern, there is a probability for an interfacial subsystem to be present, and tracing over a chain of conditions as products of probabilities translates to an occupation probability, c p , for an individual distance constraint. Similarly, c q , is the probability that a distance constraint is independent given it is present. With this generic labelling scheme, the network rigidity part of the FEF is compactly stated by the following three formulas: The average number of constraints within a protein is c N , which will generally be greater than the 3n -6 DOF required to position all atoms in a protein. The minimum value that c N can be is 3n -6 corresponding to a protein in an ideal random coil state without any disulphide bonds (i.e. only residues and peptide bonds). Starting from a random coil, as intramolecular H-bonds crosslink the backbone, the number of constraints will increase. Yet the total number of independent constraints, c I , that are present in the protein is conserved across all accessible constraint topologies to maintain a rigid protein. The condition of a rigid protein is required because it means that everything that can be specified is specified within tolerances. This property reflects the completeness of the FED. With the liberated view of rigidity, R provides a lowest upper bound estimate for the total conformational entropy.

Vibrational contributions
The contribution to the FEF from vib G would be straightforward once the frequency is known for all 3n -6 modes. Unfortunately, determining vibrational frequencies for the entire protein is a task that must be avoided to retain computational efficiency in the interfacial thermodynamics model. Therefore, the frequency spectrum is modelled empirically to capture two important features. Vibrational mode frequencies will (increase, decrease) as redundant distance constraints are (added, removed) to a network as it (stiffens, softens). Furthermore, the lowest frequency decreases as the size of the system increases. Assuming a vibrational frequency can be defined for each c-index, vib G is given by: www.intechopen.com To account for the first feature, the entropy spectrum for the system is employed to define a frequency spectrum where max min / cc     to reflect the generic property that frequency is inversely proportional to pure entropy. This relation would be sufficient if the entire protein was employed as the sole constituent, but the pure entropy spectrum is comprised from many small subsystems resulting in a range for c  not reaching sufficiently low frequency. By generalizing the Debye model [Kittel, 1996] for vibrations in a crystalline solid, system size dependence can be better accounted for. In the Debye model, the frequency spectrum is given as: In Eq. (15), the right-most fraction parallels the Debye model, where the numerator defines an effective mode number that ranges from 1 to 3n -6. Mode (3n -6) corresponds to c=1, or the lowest c-index present in the network. As the mode index decreases with increasing c-index, the frequency of vibration lowers in the same way as the Debye model. The front scale factor min / c   (i.e. min 1    ) modifies the dispersion, which is required to span a disparate range of vibrational frequencies typical of proteins. Consequently, as more redundant constraints appear in the network, the lowest frequency mode will be reached sooner on the entropy ladder (smaller c-index) so that the lowest lying vibrational frequencies shift higher without affecting high frequencies. The spectrum for a protein in a random coil will reach the lowest possible frequency. Note that upon protein-ligand binding, frequency shifts that take place due to changes in the constraint network and size of the system are roughly accounted for.

Self-consistent constraint theory
The essence of constraint theory applied to an interfacial thermodynamics model is to determine how a system responds under certain global conditions that are consistent with microscopic heterogenous environments. The previous section constructed a FEF to describe protein stability by building up a hefty collection of free energy terms representing specific enthalpy-entropy mechanisms. This process lead to conjuring up the probability functions { rs p , ihb h p , nat r p , c p , c q } that will be determined self-consistently when solving the FEF. In doing so, it is critically important that heterogeneous micoenvironments throughout the protein are taken into account. An efficient way to solve the FEF is to introduce a number of constitutive equations to transform the functional into a variational problem in parametric form that allows certain global constraints to be imposed. Specifically, a trial function with a number of variables is substituted into the functional to reduce the problem to finding the minimum of a function. These variables correspond to Lagrange multipliers that control the values of selected order parameters. Actually, this method builds upon a previous method used to solve the mDCM (Jacobs & Dallakayan, 2005), but there are critical differences.
First, I highlight key points about the initial method (Jacobs, 2006b) for solving the mDCM. Occupation probabilities { c p } are calculated using Lagrange multipliers to enforce the amount of intramolecular H-bonds and native-like character within a protein. Given { c p }, a constraint network is randomly generated, and a rigidity algorithm is applied with the preferential entropy rule to identify the distance constraints as independent ( 1 ). After collecting s N random samples, an estimate is given as cc qq   . By focusing on distance constraints, all the formulas in Eq. (13) apply, and s N should be at least 200 to obtain a useful estimate for conformational entropy, R . The employed trial function for c p does not depend on whether a distance constraint is independent or not. For this information, c p must be a function of c q . In this latter case, the calculation scheme would then be to assume 1 c q  , calculate c p , determine cc qq   from sampling, then recalculate c p , and iterate this process until the latest values of c p are nearly equal to the previous values of c p within some tolerance to obtain self-consistency.
To simply implement this self-consistent approach using averaging does not work because convergence will not occur when the precision in c q is low. To ensure convergence, s N should be at least 500,000, suggesting a self-consistent calculation is not tractable! Recently, a new algorithm to calculate average network rigidity properties as a probability flow problem describing independent DOF and where DOF absorb onto constraints has been successfully developed [Gonzalez, et al. 2011a, Gonzalez, et al. 2011b]. Now c q is calculated to within numerical precision, making the self-consistent calculation feasible. Also different from the mDCM: The FEF described above has almost all conformational subsystems that involve distance constraints coupled to the solvation states of residues, as well as other interfacial surface terms between the residues that reflect local microenvironments. Tracking these additional details goes far beyond the mean field treatment invoked in the mDCM. Consequently, a very different set of order parameters need to be considered.

Order parameters and the free energy landscape
The free energy of a protein is numerically calculated while subjected to global constraints imposed by order parameters that define a specific macrostate. Scanning over the entire range of order parameters produces the FEL. Since a protein is of finite size, the minimum free energy is not the only point on the FEL of interest. Rather, the entire FEL is of interest because it maps out all the low-lying basins and saddles. The natural variables of the FEF that describe microstates dictate the form of the macrostates, and this determines what order parameters need to be considered. The solvent environment of the residues and whether they are native-like when buried is the only information needed to completely define the microstate of a protein. Therefore, order parameters B , M and N are introduced to specify the macrostate of a protein, giving the total number of residues that are respectively Buried, exposed to Mobile solvent and Native-like. Note that the number of residues in the exposed clathrate state 4 , H , is not an independent variable, since the total number of residues, R , is given by: RBMH   . Furthermore, there is a restriction on N , such that 0 NB . As such, the FEL is expressed as (, , |, ) GB M N T  where the triple dots are a reminder that in addition to temperature, pressure and pH can be directly considered, although not here, and, many parameters in the FEF depend on solvent composition.

Hierarchical application of global constraints
The macrostate (, , ) BMN is associated with the following three constraint equations: respectively. Propensity functions characterize physical and/or chemical properties relevant to their conjugate order parameters in the local environment surrounding residue r, based on the template structure. The local propensity for being buried is () r  , defined as the number of nearest neighbour contacts to residue, r. The greater number of nearest neighbours a residue has in the template structure, the greater resistance to solvent penetration irrespective of its intrinsic solvation character quantified by the slv rs Z factors. For () r  , it is set to 1, so that no differentiation is made between clathrate and mobile exposed microenvironments. The template structure is used to define local microenvironments that individual residues will experience. By adjusting the Lagrange multipliers, the total number of residues buried and exposed to mobile solvent is controlled (at least on average). Note that fluctuations are accounted for in the FEF through the mixing entropy terms. Constraining a protein to macrostate (, , ) BMN corresponds to selecting a sub-ensemble of microstates that share the common property that the total numbers of buried and exposed to mobile residues are B and M . It is seen from Eq. (17)  The form of Eq. (18) is that it has a Boltzmann factor for a native-like state in the numerator, divided by the sum of Boltzmann factors for the native-like and disordered states. This ratio gives the probability that residue, r, will be in the native-state. The Boltzmann factor for the native-like state contains the Lagrange multiplier, N  , to enforce the number of native-like residues in the system to be, N , in accordance with Eq. (16). The form of Eq. (19) is similar, except the numerator is the Boltzmann factor for a disordered IHB, and the denominator is the sum of Boltzmann factors for a disordered IHB and a protein-solvent H-bond. These two possibilities compete head to head, but there is no additional Lagrange multiplier, as this process is not tied to an order parameter.
It is clear from Eqs. (18 and 19) that knowing the probability certain distance constraints in the constraint network are independent is necessary. However, these q-values are initially unknown, and therefore, an iterative self-consistent calculation must be invoked. Note that the probability for a distance constraint to be present is equal to the probability for the basin that it is a member of to be present in the network. For example, all distance constraints used to model the conformational part of the free energy for residue, r, when it is native-like is equal to nat r p . Conversely, the probability of is assigned to all the distance constraints for this residue when it is in a disordered state.
The occupation probabilities, c p , is 1 for quenched constraints, or it is straightforward to get the probabilities from { rs p , ihb h p , nat r p } once they are known. There is, of course, a chicken and egg problem because c q is determined after c p is known, but c q must be known before ihb h p and nat r p can be calculated. The procedure is to guess the initial values of c q , calculate c p , apply a rigidity analysis to obtain c q and then recalculate c p . Iterate this process until the values for both c p and c q converges. Note that these equations converge to the unique solution independent of initial guess. The type of guesses tried include, for each c-index c q set to 1, set to 0, or set to a value between 0 and 1, or independently assign a random number between [0,1]. Notice that c q and c p imply one-dimensional arrays for c=1 to C. In fact, any variable that has one index or more than one index implies an array of values. It is worth mentioning here that convergence is reached typically within 15 iterations using the new rigidity algorithm [Gonzalez, et al. 2011a]. However, MCC yields qualitatively similar results, and as described next, captures the essential features about the role of rigidity.

The entropy spectrum and maxwell constraint counting
The entropy spectrum for an example set of subsystems that can be found within a protein is schematically shown in Fig. 7. Applying MCC with the preferential entropy rule is equivalent to filling the available levels of the entropy spectrum of the system starting from the bottom until the system becomes isostatically rigid. All the distance constraints that are placed in the network before the protein has the minimum number of constraints to become isostatically rigid are considered independent. As more distance constraints are added to the network, they are all redundant. This global and uniform transition point between where the constraints are independent and redundant defines the Maxwell level. . Let M q be the value of c q at the Maxwell level. Then the self-consistent calculations described above amounts to finding a solution in the form of a step function, where the only unknown is where the step is located on the entropy spectrum of the system.  Fig. 7. Schematic of how the entropy spectrums associated with subsystems combine into a single entropy spectrum for the entire system. Green vertical arrows pointing to the # sign indicate that the number of interactions depend on the decorated microstate. The "clamp" interaction derives from clathrate water. The horizontal dark red double arrow indicates a certain number of interactions can be native like (N) or disordered (D), and their respective entropy spectrums are shown. Each residue has a unique entropy spectrum. Residues with (smaller, larger) entropy values are more (rigid, flexible). The protein sequence defining the residue composition is represented as the large horizontal blue double arrows. The system entropy spectrum is characterized by the variables (p c , q c ). From Eq. (20) distance constraints with entropy less than the Maxwell level are independent, and this level slides up and down the spectrum depending on numbers and types of interactions present in the system.  20), the self-consistent solution results in a dramatic impact on the thermodynamic response of the system. This is because in general every constraint in the network is competing against all other constraints that are present in the network. What changes is the number of distance constraints that appear within the protein for different macrostates. As more constraints are added to the network, the entropy drops.

Response of local environments to global demands
The type of intramolecular interactions and their locations within a protein depends on the solvation state of the residues and local environments encoded by the amino acid sequence and template structure. For example, using the fold architecture shown in Fig. 4, and for the solvation decoration shown in Fig. 5, a hydrophobic homo-polymer (HH), a heterogeneous protein (HP) and a polar homo-polymer (PH) are shown in Fig. 8. Fig. 8. Schematic illustration of three sequences with the same architecture and solvent decoration. Hydrophobic residues (squares) do not participate in H-bonding. H-bonding is only allowed between polar residues (circles) that are nearest or next nearest neighbours not linked by covalent bonds. Because of the solvent decoration there is only one hydrophobic interaction, shown as a dark blue diamond. Non-fluctuating intra-H-bonds are shown as solid red lines, and red dashed lines indicate fluctuations between intra-and solvent-protein H-bonds. From left to right the three panels show the HH, HP and PH cases, respectively exhibiting (no, limited, many) intramolecular H-bonds for this solvation microstate. The hydrophobic interactions are identical across the three sequences because they only depend on the transfer of water from buried regions to bulk solvent, which is the same in all cases.
In water, HH will form a collapsed state much like an oil droplet. PH will be soluble and will resemble a random coil because few crosslinks form. HP can potentially produce a rich phase diagram. In the next section, stability curves for all three of these toy polymers will be shown based on model parameters that were adjusted to produce heat and cold denaturation in HP for the purpose to facilitate general discussions. The same parameters are used for all three cases, and they are in a physically reasonable range. However, the toy models are not structurally realistic, therefore, the parameter values (not given here) are not important. Rather, the critical issue at hand is developing a tractable paradigm that can accurately model the complexity of all the coupled interactions within/on a protein, and do this in a computationally efficient way.
Self-consistent constraint theory applied to the FEF determines the microenvironments that emerge as most probable. Although Eq. (3) and Eq. (4) at face value appear to be additive, the myriad coupling between interactions will cause two interactions of identical type placed in different microenvironments and/or under different solvent and thermodynamic conditions to respond differently. In particular, the coupling through network rigidity renormalizes the conformational entropic contributions (via the c q values), and this strongly affects where and how solvent penetrates the protein; thus changing the properties of local environments, which impacts the constraint network. For example, a cluster of H-bonds can form a strong nucleation barrier causing a localized buried region to be highly resistant to solvent penetration compared to a similar buried region without a H-bond cluster. Consequently, non-additive response derives from a chemicophysical feedback loop. This is captured in the process of minimizing the free energy to determine the optimal constraint topology and solvent decoration under the specified thermodynamic and solvent conditions for a given template structure while satisfying Eq. (16) and Eq. (20) for a particular macrostate (, , ) BMN .
phase diagram in terms of how much the protein is buried or exposed to solvent in different forms. Because of the constraint that 1( ) / BMH R   , a Gibbs triangle is employed so that all solvation states can be viewed simultaneously in terms of percentages, as shown in Fig. 9. Fig. 9. The Gibbs triangle serves as the base of a three-dimensional FEL that looks like a wedge when viewed at fixed temperature and using the B , M and N order parameters. That is, 0 N  on the bottom horizontal line when 0 B  , and the tip of the wedge is at 1 B  .
Three stable basins will generically appear in the Gibbs triangle. At low temperatures, it is possible that a free energy minimum will appear when the percent of clathrate structure is dominant. At intermediate temperatures, the number of buried residues in the protein will be dominant, but they can be native-like or disordered. Therefore, a protein may be in either the molten globular 5 or native-fold states. At high temperatures, a molten globular state may remain, or further unfold to allow a majority of residues to become exposed to solvent. Next, the free energy of three representative macrostates is plotted as a function of temperature. Structural phase transitions corresponding to changing free energy basins are like first order transitions, because they occur when stability curves for different states cross one another.

Stability curves: Heat and cold denaturation
The stability curves for the toy polymer cases (HH, HP, PH) shown in Fig. 8 are calculated for three macrostates in the Gibbs triangle that correspond to a dominant characteristic of 5 The degree to which a protein is native-like versus disordered is lost during the process of summing over the native-like order parameter in Eq. (21). However, this information is known from the original 3D free energy landscape. clathrate, buried or mobile, and these curves are plotted in Fig. 10 A, B, C respectively. In fact, the particular decoration shown in Fig. 8 has a much higher free energy than the lowest free energy state, and is for practical purposes a state of measure zero. Nevertheless, the FEF determines the statistical weights for all macrostates in the high dimensional FEL, and the high free energy states around saddles are important in describing transition states.
As expected, the HH case shows that the buried state is the most stable form over the entire temperature range, and thus there is no phase transition. For the PH case, the structure is always exposed to solvent, but it is interesting to note that there can be a structural phase transition in a protein between low and high temperature without it involving a compact folded structure. This result suggests there is a difference between structural properties in the conformational ensembles of a cold-and heat-denatured polymer. Interestingly, the HP case exhibits both cold and heat denaturation. In all likelihood, a protein of this size (16 residues) would not exhibit a phase transition, however, the parameters were optimized to make this situation occur for the HP case. Although the same set of parameters is used for the HH and PH cases, the competing enthalpy-entropy compensation mechanisms within a heterogenous protein (modelled by HP) make it possible for such a rich phase diagram.
The phenomenon of cold denaturation often does not occur because the temperature at which it would take place is too low to be observed. Osmolytes can be used to modify bulk solvent properties to control where the crossing points of the triangle shown in Fig. 10B occur. It is seen in Fig. 10D that the total entropy of a protein increases as a function of temperature. Fig. 10E shows that the order parameter for clathrate solvation content is a monotonically decreasing function of temperature, so that at higher temperatures a greater competition between buried and exposed-mobile states occur. Other order parameters can be easily calculated, such as the number of intramolecular H-bonds or hydrophobic contacts, which are shown in Fig. 10F as tracking one another. Although not shown here, tracking the native contact order parameter allows one to determine if a compact structure is native-like or that of a molten globular. In general, detailed information about solvent penetration and mechanical response of a protein is predicted at fine resolution, and this interplay is very important to protein function [Purkiss, et al. 2001], and these relationships have been more recently been probed experimentally [Kamerzell, et al. 2008;Pais, et al. 2009].

Conformational ensembles in the native and denatured states
Experiments indicate that native structure persist in the denatured states of proteins at low temperature [Shan, et al. 2010] and high temperature in the molten globular state [Shortle, 1999]. Established many years ago, the converse is true: There is appreciable solvent penetration into the native state [Woodward, et al. 1982], while buried secondary structure regions can be very resistant to solvent penetration [DeFlores & Tokmakoff, 2006]. These experimental results suggest to me that using template structures is justified, although this is not to say non-native contacts are negligible. For these cases, multiple templates should be used and these other templates can be computationally generated. Fig. 10. Left column: The stability curves are shown for macrostates corresponding to a majority of residues that are clathrate (black), buried (red) and mobile (green). In all cases the majority contributor is 88%, and the two minority contributors are each 6%. Thus, the macrostates selected are located at the apex of each corner in the Gibbs triangle. In A, B, C, results are given for HH, HP and PH respectively. Right column: For the HP case only, D) shows the heat capacity and entropy, and E & F show the temperature dependence on five different types of order parameters that respectively correspond to the residue solvation states (clathrate, buried, mobile) and numbers of H-bonds and hydrophobic contacts.

Phenomenological modeling of protein stability
The standard thermodynamic analysis for protein stability assumes two states describe an unfolded (U) and folded (F) structure. The mechanism of protein unfolding is hidden in the temperature dependence of UF GG G   . Specifically, G  is a concave quadratic function of temperature with two solutions for () 0 GT   that yields C TT  for cold denaturation and H TT  for heat denaturation. However, the fact that two transition temperatures exist at all implies a minimum of two order parameters are required to describe the phenomena. The order parameters characterize the emergent behaviour of microscopic properties, which inspired the idea of modelling residue solvation states as three possible states. In a similar way, the molten globular state is distinct from the native state with respect to the degree of disorder, which inspired applying the native-like order parameter to define the FEL. Three independent order parameters appear to me as the minimum number of descriptors to describe protein stability consisting of the exposed unfolded state at low temperature, the compact native and molten globular states, and, the exposed unfolded state with little to no residual secondary structure at high temperature. The results presented in Fig. 10 produced three distinct states 6 , indicating the ensemble of conformations of the unfolded state is structurally distinct at low and high temperatures, as demonstrated by the PH example in Fig. 10C. However, the controversial clathrate mechanism is invoked and it appears essential.

Clathrate mechanism for cold denaturation: Fiction or reality?
Models often invoke the clathrate mechanism to describe cold denaturation [Hansen, et al. 1998;Widom, et al. 2003]. The notion of this mechanism is based on interpretations of indirect measurements, which has been scrutinized [Graziano, 2004;Lopez, et al. 2009;Oshima, et al. 2009]. However, the critical review on protein hydration dynamics in solution [Halle, 2004] appears to me to dispel paradoxes that result from the controversy. Although the name clathrate may or may not be misleading, the interfacial thermodynamics model is based on general principles of statistical mechanics, which does not depend on a name. All that matters is the affect on the protein from solvent. The interfacial thermodynamics model accounts for native-like and disordered structure, and solvent penetration due to structural deformations. The model requires a partition function for the ensemble of water configurations around a residue, and an empirical contact term representing the affect on the protein's flexibility. This partition function is surely difficult to calculate from first principles, but a partition function can always be partitioned (hence the name) into a sum of terms. Dividing all configurations into two groups that classify the solvent in contact with a residue based on whether there is a small or large reduction in flexibility, no matter how small the difference may be is a valid mathematical exercise that does not change the physics because no Boltzmann factors are dropped.
The exact partition function, Z , is written as: cm ZZ Z   where c Z sums over all terms that reduces the flexibility in the residue more than the terms summed in m Z , no matter how small of a difference there may be. Therefore, m Z describes a more-mobile water-residue system, and c Z describes a less-mobile water-residue system. Because "more-mobile" and "less-mobile" is cumbersome, I prefer to use the names mobile and clathrate. However, the important point is that it is not the property of water that is more or less mobile, it is the interaction between the water and residue that cause the residue to be more or less mobile, which is the basis of the classification. In fact, this description has been done for the twenty amino acids found in proteins to arrive at all the necessary solvation parameters [Du, et al. 2011]. Therefore, the notion of a clathrate mechanism is a reality, but it may not correspond to the original notion, and it must be calculated in the context of a large ensemble of waterresidue configurations, where the flexibility of the residue must be quantified and assessed. Furthermore, a more refined classification scheme is in principle possible.

Unifying different perspectives on protein stability
Recent work on the thermodynamic response of a system subjected to geometrical constraints [Chen, et al. 2009] suggests that indirect intermolecular correlations, rather than geometric constraints, are the key to achieving a first-order phase transition. Although this latter conclusion was obtained in a different context, it appears the same idea is implied in the interfacial thermodynamics model. In particular, the template structure provides the native-state geometrical constraints, but all subsequent calculations only involve molecular correlations. That is, terms in the FEF couple intramolecular interactions to residue solvation properties as the protein conformation changes, albeit no new geometries are generated. I was surprised that the "standard model" for protein stability and folding was criticized with much scrutiny recently [Ben-Naim, 2011] with several misconceptions highlighted. The main concerns raised were: 1) Free energy landscapes must be used to quantify protein stability, not energy landscapes; 2) non-additivity is an inherit property of entropy; 3) stability differences due to the trade off between intramolecular and protein-solvent Hbonds are too weak to drive protein folding; 4) hydrophobic interactions are also too weak to be the dominant driving force; whereas 5) the hydrophilic interactions are the strong driving forces that fold a protein. These five points and the clathrate mechanism controversy provide an opportunity to exam the assumptions of the interfacial thermodynamics model.
All solvation effects appear either as volume terms for the primary constituents (residues) or as surface terms between the interfaces of these constituents. The volume and surface terms taken together represent hydrophilic and hydrophobic interactions, the clathrate mechanism and protein-solvent H-bonds interactions using implicit solvent. Non-additivity in entropy components is a primary concern of the approach, and it directly deals with the free energy landscape. Moreover, all parameters have thermodynamic interpretations. For example, the parameters (, ) hph hph  for the hydrophobic interaction combine into the chemical potential to transfer a water molecule from a buried region in the protein to bulk solvent. In short, the modelling scheme put forth is complete, with the exception of long-range electrostatics.

Future direction
The particular FED that has been described above to define all the terms in the FEF, and using the simple MCC in the self-consistent constraint theory calculations is but one possible implementation of the interfacial thermodynamics model for protein stability. Similar to the strategy described here, but with finer coarse-graining [Jacobs, 2007b], new software called FAST is being finalized. Flexible to Flexibility proteins to predict myriad thermodynamic and mechanical properties for high-throughput applications. With my collaborator Prof. Dennis Livesay at UNC Charlotte, and our research associates, Dr. Hui Wang and Dr. Chuanbin Du, the software has been designed and coded in C++ from scratch to provide a stable platform to support calculations similar to those described here. The FEF is of a general form that includes accounting for protonation states on titratable residues and explicit packing interactions. Moreover, the FEF is self consistently solved using an accurate rigidity algorithm for which Dr. Luis Gonzalez has helped develop, and he has documented its accuracy over the course of his Ph.D. studies. Many results to be published have been reported at several conferences, such as how model parameters are determined. With the Herculean effort spent on computational methods and optimization by Dr. Wang, FAST exceeds the speed of the mDCM and has greater accuracy. Going forward, the main concern is to find support to finish FAST so it can be released as free software to academic users.

Conclusion
From conception, the interfacial thermodynamics model for protein stability was designed to balance accuracy with computational cost such that it can be applied in high-throughput applications. To meet this pragmatic objective, the fundamental problem of non-additivity of conformational entropy that plagues free energy decomposition schemes has been tackled directly by employing the Distance Constraint Model. In particular, network rigidity is invoked as an underlying long-range interaction that couples entropy components between intramolecular subsystems comprising a protein. The problem is formulated as a free energy functional, and it is numerically solved using constitutive equations and self-consistent constraint theory. While the model makes many approximations, it is able to retain essential elements that describe protein thermodynamics and mechanical properties. Perhaps the best aspect of the interfacial thermodynamics model is that every term is intuitive physically and chemically. Different types of enthalpy-entropy compensation mechanisms can be modeled, and their competing effects can be simultaneously calculated with high efficiency.

Acknowledgement
This work has been supported by NIH R01 GM073082.