Computational Potential Energy Minimization Studies on the Prion AGAAAAGA Amyloid Fibril Molecular Structures

X-ray crystallography, NMR (Nuclear Magnetic Resonance) spectroscopy, and dual polarization interferometry, etc are indeed very powerful tools to determine the 3D structures of proteins (including the membrane proteins), though they are time-consuming and costly. However, for some proteins, due to their unstable, noncrystalline and insoluble nature, these tools cannot work. Under this condition, mathematical and physical theoretical methods and computational approaches allow us to obtain a description of the protein 3D structure at a submicroscopic level. This Chapter presents some practical and useful mathematical optimization computational approaches to produce 3D structures of the Prion AGAAAAGA Amyloid Fibrils, from a potential energy minimization point of view. X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements in order to produce a better structure. The computational methods presented in this Chapter can be also acted as a tool for the refinements.


INTRODUCTION
X-ray crystallography, NMR (Nuclear Magnetic Resonance) spectroscopy, and dual polarization interferometry, etc are indeed very powerful tools to determine the 3D structure of a protein (including the membrane protein), though they are time-consuming and costly. However, for some proteins, due to their unstable, noncrystalline and insoluble nature, these tools cannot work. Under this condition, mathematical and physical theoretical methods and computational approaches allow us to obtain a description of the protein 3D structure at a submicroscopic level. This Chapter presents some practical and useful mathematical optimization computational approaches to produce 3D structures of the Prion AGAAAAGA Amyloid Fibrils, from an energy minimization point of view.
X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements in order to produce a better structure. The computational methods presented in this Chapter can be also acted as a tool for the refinements.
All neurodegenerative diseases including Parkinson"s, Alzheimer"s, Huntington"s, and Prion"s have a similarity, which is they all featured amyloid fibrils (en.wikipedia.org/wiki/Amyloid and references (Nelson et al., 2005;Sawaya et al., 2007;Sunde et al., 1997;Wormell, 1954;Gilead and Gazit, 2004;Morley et al. 2006;Gazit, 2002;Pawar et al., 2005; and references therein). A prion is a misshapen protein that acts like an infectious agent (hence the name, which comes from the words protein and infection). Prions cause a number of fatal diseases such as mad cow disease in cattle, scrapie in sheep and kuru and Creutzfeldt-Jakob disease (CJD) in humans. Prion diseases (being rich in β-sheets (about 43% β-sheet) (Griffith, 1967;Cappaia and Collins, 2004;Daude, 2004;Ogayar and Snchez-Prez, 1998;Pan et al., 1993;Reilly, 2000) belong to neurodegenerative diseases. Many experimental studies such as (Brown, 2000;Brown, 2001;Brown, 1994;Cappai and Collins, 2004;Harrison et al., 2010;Holscher, 1998;Jobling et al., 2001;Jobling et al., 1999;Kuwata et al., 2003;Norstrom and Mastrianni, 2005;Wegner et al., 2002;Laganowsky et al., 2012;Jones et al., 2012;Sasaki et al., 2008;Haigh et al., 2005;Kourie et al., 2003;Zanuy et al., 2003;Kourie, 2001;Chabry et al., 1998;Gasset et al., 1992) have shown that the normal hydrophobic region (113-120) AGAAAAGA of prion proteins is an inhibitor/blocker of prion diseases. PrP lacking this palindrome could not convert to prion diseases. The presence of residues 119 and 120 (the two last residues within the motif AGAAAAGA) seems to be crucial for this inhibitory effect. The replacement of Glycine at residues 114 and 119 by Alanine led to the inability of the peptide to build fibrils but it nevertheless increased. The A117V variant is linked to the GSS disease. The physiological conditions such as pH (Cappai and Collins, 2004) and temperature (Wagoner et al., 2011) will affect the propensity to form fibrils in this region. The 3D atomic resolution structure of PrP (106-126), i.e. TNVKHVAGAAAAGAVVGGLGG, can be looked as the structure of a control peptide (Cheng et al., 2011;Lee et al., 2008). Ma and Nussinov (2002) established homology structure of AGAAAAGA and its molecular dynamics simulation studies. Recently, Wagoner et al. studied the structure of GAVAAAAVAG of mouse prion protein (Wagoner, 2010;Wagoner et al., 2011). Furthermore, the author computationally clarified that prion AGAAAAGA segment indeed has an amyloid fibril forming property (Fig. 1).  is surely and clearly identified as the amyloid fibril formation region, because its energy is less than the amyloid fibril formation threshold energy of -26 kcal/mol (Zhang et al., 2007).

Fig. 1: Prion
However, to the best of the author"s knowledge, there is little X-ray or NMR structural data available to date on AGAAAAGA, which falls just within the N-terminal unstructured region (1-123) of prion proteins, due to its unstable, noncrystalline and insoluble nature. This Chapter will computationally study the molecular modeling (MM) structures of this region of prions.
A new era in the structural analysis of amyloids started from the "steric zipper"-β-sheets (Nelson et al., 2005). As the two sheets zip up, HPs (Hydrophobic Packings) (& vdws) have been formed. The extension of the "steric zipper" above and below (i.e. the βstrands) is maintained by HBs (but there is no HB between the two β-sheets). This is the common structure associated with some 20 neurodegenerative amyloid diseases, ranging from Alzheimer"s and type-II diabetes to prion diseases. For prion AGAAAAGA amyloid fibril structure, basing on the common property of potential energy minimization of HPs, vdws, and HBs, we will present computational molecular structures of prion AGAAAAGA amyloid fibrils.

Hybrid Method of Steepest Descent -Conjugate Gradient with Simulated Annealing
X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements using a simulated annealing protocol in order to produce a better structure. Thus, it is very amenable to use simulated annealing (SA) to format the models constructed. Zhang (2011aZhang ( , 2011d presents a hybrid method of global search SA with local steepest descent (SD), conjugate gradient (CG) search. The hybrid method is executed with the following three procedures. (1) Firstly the SD method and then the CG method are executed. These two local search methods are traditional optimization methods. The former has nice convergence but is slow when close to minimums. The latter is efficient but its gradient RMS and GMAX gradient (Case et al., 2010) do not have a good convergence. (2) When models cannot be optimized further, we employ standard SA global search procedure. (3) Lastly, the SD and CG methods are used to refine the models. The PDB (Berman et al., 2000) templates used in (Zhang, 2011a(Zhang, , 2011d are 2OKZ.pdb, 2ONW.pdb, 2OLX.pdb, 2OMQ.pdb, 2ON9.pdb, 2ONV.pdb, 2ONA.pdb, 1XYO.pdb, 2OL9.pdb, 2OMN.pdb, 2ONX.pdb, 2OMP.pdb, 1YJP.pdb of (Sawaya et al., 2007), but only the 2OMP and 1YJP template-based three MM-Models ( Fig. 6a~6c in (Zhang, 2011a)) are successfully passed through the SDCG-SA-SDCG computational procedures. Zhang et al. (2011aZhang et al. ( , 2011d used 3FVA.pdb as the pdb template to build two MM-Models (Figs. 11~12 in ). The Models were built using a hybrid SA Discrete Gradient (DG (Bagirov et al., 2008)) method. Then the Models were optimized using SDCG-SA-SDCG methods as in (Zhang, 2011a). Zhang et al. (2011Zhang et al. ( , 2011d used 3NHC.pdb, 3NVF/G/H/E.pdb templates to build several MM-Models (Figs. 9~11 in , and Figs. 5~8 in (Zhang, 2011b)). These Models were built in the use of canonical dual global optimization theory Gao and Wu, 2012;Gao, 2000) and then refined by SDCG-SA-SDCG methods as in (Zhang, 2011a).

New Material
This Chapter uses a suitable pdb file template 3NHD.pdb (the GYVLGS segment 127-132 from human prion with V129 (Apostol et al., 2010) from the Protein Data Bank to build MM-Models of AGAAAAGA amyloid fibrils for prions.

Computations
The computational methods used to build the new MM-Models will be simulated annealing evolutionary computations (SAEC), where SAECs were got from the hybrid algorithms of (Abbass et al., 2003) by simply replacing the DG method by the SA algorithm of  and numerical computational results show that SAECs can successfully pass the test of more than 40 well-known benchmark global optimization problems (Zhang, 2011c).

New MM-Models
The atomic-resolution X-ray structure of 3NHD.pdb is a steric zipper, with strong vdw interactions between β-sheets and HBs to maintain the β-strands (Fig. 2). By observations of the 3rd column of coordinates of 3NHD.pdb and Fig. 2, G(H) chains (i.e. β-sheet 2) of 3NHD.pdb can be calculated from A(B) chains (i.e. β-sheet 1) by Eq. 1 and other chains can be calculated by Eqs. 2~3: Basing on the template 3NHD.pdb from the Protein Data Bank, three prion AGAAAAGA palindrome amyloid fibril models -an AGAAAA model (Model 1), a GAAAAG model (Model 2), and an AAAAGA model (Model 3) -will be successfully constructed in this Chapter. Because the template is a segment of 6 residues, the three shorter prion fragments are selected. This Chapter does not perform calculations on the full AGAAAAGA. Chains AB of Models 1~3 were respectively got from AB chains of 3NHD.pdb using the mutate module of the free package Swiss-PdbViewer (SPDBV Version 4.01) (http://spdbv.vital-it.ch). It is pleasant to see that almost all the hydrogen bonds are still kept after the mutations; thus we just need to consider the vdw contacts only. Making mutations for GH chains of 3NHD.pdb, we can get the GH chains of Models 1~3. However, the vdw contacts between Chain A and Chain G, between B chain and H chain are too far at this moment (Figs. 3~5).   The vdw contacts of atoms are described by the LJ potential energy: where ε is the depth of the potential well and σ is the atom diameter; these parameters can be fitted to reproduce experimental data or deduced from results of accurate quantum chemistry calculations. The (σ/r) 12 term describes repulsion and the -(σ/r) 6 term describes attraction. If we introduce the coordinates of the atoms whose number is denoted by N and let ε=σ= 1 be the reduced units, the Eq. 4 becomes into f(x) = 4 ∑i=1 N ∑j=1,j<i N (1/tij 6 -1/tij 3 ), where tij = (x3i−2 -x3j−2) 2 + (x3i−1 -x3j−1) 2 + (x3i -x3j) 2 , (x3i−2, x3i−1, x3i) is the coordinates of atom i, N≥2. The minimization of LJ potential f(x) on R n (where n = 3N) is an optimization problem: Similarly as Eq. 4, i.e. the potential energy for the vdw interactions between β-sheets: VLJ(r) = A/r 12 -B/r 6 , the potential energy for the HBs between the β-strands has the formula VHB(r) = C/r 12 -D/r 10 , where A, B, C, D are given constants. Thus, the amyloid fibril molecular modeling problem can be deduced into the problem to solve the mathematical optimization problem Eq. 6. Seeing Fig. 6, we may know that the optimization problem Eq. 6 reaches its optimal value at the bottom of the LJ potential well, where the distance between two atoms equals to the sum of vdw radii of the atoms. In this Chapter, the sum of the vdw radii is the twice of the vdw radius of Carbon atom, i.e. 3.4 angstroms. The optimization problem Eq. 6 is a nonconvex complex optimization problem. By the observation from Fig. 6, we may solve its simple but equal convex-and-smooth least square optimization problem (or the so-called distance geometry problem or sensor network problem) with a slight perturbation if data for three atoms violate the triangle inequality. The following three optimization problems for Models 1~3 respectively are: Because Eqs. 9~11 are optimization problems with 6 variables only and these optimization problems are to minimize fourth-order polynomials, the proposed SAEC method and other computational methods can easily get the same optimal solutions to optimize the above three models.

The Practical LBFGS Quasi-Newtonian Method
Energy minimization (EM), with the images at the endpoints fixed in space, of the total system energy provides a minimum energy path. EM can be done using SD, CG, and LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno). SD is robust and easy to implement but it is not most efficient especially when closer to minimum. CG is slower than SD in the early stages but more efficient when closer to minimum. The hybrid of SD-CG will make SD more efficient than SD or CG alone. However, CG cannot be used to find the minimization energy path, for example, when "forces are truncated according to the tangent direction, making it impossible to define a Lagrangian" (Chu et al., 2003). In this case, the powerful and faster quasi-Newtonian method (e.g. the LBFGS quasi-Newtonian minimizer) can be used (Chu et al., 2003;Liu and Nocedal, 1989;Nocedal and Morales, 2000;Byrd et al., 1995;Zhu et al., 1997). We briefly introduce the LBFGS quasi-Newtonian method as follows.
Newton"s method in optimization explicitly calculates the Hessian matrix of the secondorder derivatives of the objective function and the reverse of the Hessian matrix (Dennis et al., 1996). The convergence of this method is quadratic, so it is faster than SD or CG. In high dimensions, finding the inverse of the Hessian is very expensive. In some cases, the Hessian is a non-invertible matrix, and furthermore in some cases, the Hessian is symmetric indefinite. Qusi-Newton methods thus appear to overcome all these shortcomings.
Quasi-Newton methods (a special case of variable metric methods) are to approximate the Hessian. Currently, the most common quasi-Newton algorithms are the SR1 formula, the BHHH method, the widespread BFGS method and its limited /low-memory extension LBFGS, and Broyden's methods (http://en.wikipedia.org/wiki/Quasi-Newton_method). In Amber (Case et al., 2010) and Gromacs (van der Spoel et al., 2010), LBFGS is used, and the hybrid of LBFGS with CG -a Truncated Newton linear CG method with optional LBFGS Preconditioning (Nocedal and Morales, 2000) -is used in Amber (Case et al., 2010).

New Thinking about the Construction of 3D-Structure of a Protein
If a NMR or X-ray structure of a protein has not been determined and stored in PDB bank yet, we still can easily get the 3D-structural frame of the protein. For example, before 2005 when we did not know the NMR structure of rabbit prion protein, we could get its homology model structure using the NMR structure of the human prion protein (PDB id: 1QLX) as the template (Zhang et al., 2006). We may use the homology structure to determine the 3D-structural frame of a protein when its NMR or X-ray structure has not been determined yet. The determination is an optimization problem described as follows.
We solve Eqs. 15~16 (taking the average of the coordinates of Cα atoms as initial solutions), getting their optimal solutions w1 = (-10.751, 6.428, 1.411) T , w2 = (-7.960, 4.579, -2.256) T respectively (Fig. 10). We may use the vectors w1, w2 and Eq. 12 to construct Chains GH and then build an optimal Model 1 (Aqvist, 1986;Abagyan and Maiorov, 1988;Orengo et al., 1992;Young et al., 1999;Foote and Raman, 2000). In (Burkowski, 2009: pages 213-216), wx 2 + w y 2 + wz 2 =1 (i.e. w is a unit vector) is restrained and Eq. 13 becomes into a problem to seek the smallest eigenvalue (S*) and its corresponding eigenvector w of the following matrix: This matrix is symmetric and positive definite, and its eigenvectors form an orthogonal basis for the set of atoms under consideration. In physics, it is called the inertial tensor involving studies of rotational inertia and its eigenvectors are called the principle axes of inertia. Furthermore, we may also notice that Eq. 13 can be rewritten as min (∑i=1 N ||di|| 2 ) 2 subject to w T w=1, where ||di|| 2 = (a (i) x 2 + a (i) y 2 + a (i) z 2 ) (wx 2 + w y 2 + wz 2 ) -( a (i) x wx + a (i) y wy + a (i) z wz ) 2 . Thus, Eq. 17 can be easily solved by the canonical dual global optimization theory Gao and Wu, 2012;Gao, 2000), by the ways of solving the canonical dual of Eq. 17 or solving the quadratic differential equations of the prime-dual Gao-Strang complementary function Gao and Wu, 2012;Gao, 2000) through some ordinary or partial differential equation computational strategies.

CONCLUSIONS
To date the hydrophobic region AGAAAAGA palindrome (113-120) of the unstructured N-terminal region (1-123) of prions has little existing experimental structural data available. This Chapter successfully constructs three molecular structure models for AGAAAAGA palindrome (113-120) by using some suitable template 3NHD.pdb from Protein Data Bank and refinement of the Models with several optimization techniques within AMBER 11. These models should be very helpful for the experimental studies of the hydrophobic region AGAAAAGA palindrome of prion proteins (113-120) when the NMR or X-ray molecular structure of prion AGAAAAGA peptide has not been easily determined yet. These constructed Models for amyloid fibrils may be useful for the goals of medicinal chemistry. This Chapter also introduces numerous practical computational approaches to construct the molecular models when it is difficult to obtain atomic-resolution structures of proteins with traditional experimental methods of X-ray and NMR etc, due to the unstable, noncrystalline and insoluble nature of these proteins. Known structures can be perfectly reproduced by these computational methods, which can be compared with contemporary methods. As we all know, X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements using a SA protocol in order to produce a better structure. SA is a global search procedure and usually it is better to hybrid with local search procedures. Thus, the computational methods introduced in this Chapter should be better than SA along to refine X-ray final structures.