## 1. Introduction

X-ray crystallography, NMR (Nuclear Magnetic Resonance) spectroscopy, and dual polarization interferometry, etc are indeed very powerful tools to determine the 3D structure of a protein (including the membrane protein), though they are time-consuming and costly. However, for some proteins, due to their unstable, noncrystalline and insoluble nature, these tools cannot work. Under this condition, mathematical and physical theoretical methods and computational approaches allow us to obtain a description of the protein 3D structure at a submicroscopic level. This Chapter presents some practical and useful mathematical optimization computational approaches to produce 3D structures of the Prion AGAAAAGA Amyloid Fibrils, from an energy minimization point of view.

X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements in order to produce a better structure. The computational methods presented in this Chapter can be also acted as a tool for the refinements.

All neurodegenerative diseases including Parkinson’s, Alzheimer’s, Huntington’s, and Prion’s have a similarity, which is they all featured amyloid fibrils (en.wikipedia.org/wiki/Amyloid and references (Nelson et al., 2005; Sawaya et al., 2007; Sunde et al., 1997; Wormell, 1954; Gilead and Gazit, 2004; Morley et al. 2006; Gazit, 2002; Pawar et al., 2005; and references therein). A prion is a misshapen protein that acts like an infectious agent (hence the name, which comes from the words protein and infection). Prions cause a number of fatal diseases such as ‘mad cow’ disease in cattle, scrapie in sheep and kuru and Creutzfeldt-Jakob disease (CJD) in humans. Prion diseases (being rich in β-sheets (about 43% β-sheet) (Griffith, 1967; Cappaia and Collins, 2004; Daude, 2004; Ogayar and Snchez-Prez, 1998; Pan et al., 1993; Reilly, 2000) belong to neurodegenerative diseases. Many experimental studies such as (Brown, 2000; Brown, 2001; Brown, 1994; Cappai and Collins, 2004; Harrison et al., 2010; Holscher, 1998; Jobling et al., 2001; Jobling et al., 1999; Kuwata et al., 2003; Norstrom and Mastrianni, 2005; Wegner et al., 2002; Laganowsky et al., 2012; Jones et al., 2012; Sasaki et al., 2008; Haigh et al., 2005; Kourie et al., 2003; Zanuy et al., 2003; Kourie, 2001; Chabry et al., 1998; Gasset et al., 1992) have shown that the normal hydrophobic region (113-120) AGAAAAGA of prion proteins is an inhibitor/blocker of prion diseases. PrP lacking this palindrome could not convert to prion diseases. The presence of residues 119 and 120 (the two last residues within the motif AGAAAAGA) seems to be crucial for this inhibitory effect. The replacement of Glycine at residues 114 and 119 by Alanine led to the inability of the peptide to build fibrils but it nevertheless increased. The A117V variant is linked to the GSS disease. The physiological conditions such as pH (Cappai and Collins, 2004) and temperature (Wagoner et al., 2011) will affect the propensity to form fibrils in this region. The 3D atomic resolution structure of PrP (106-126), i.e. TNVKHVAGAAAAGAVVGGLGG, can be looked as the structure of a control peptide (Cheng et al., 2011; Lee et al., 2008). Ma and Nussinov (2002) established homology structure of AGAAAAGA and did its molecular dynamics simulation studies. Recently, Wagoner et al. computer simulation studied the structure of GAVAAAAVAG of mouse prion protein (Wagoner, 2010; Wagoner et al., 2011). Furthermore, the author computationally clarified that prion AGAAAAGA segment indeed has an amyloid fibril forming property (Fig. 1).

However, to the best of the author’s knowledge, there is little X-ray or NMR structural data available to date on AGAAAAGA (which falls just within the N-terminal unstructured region (1.–123) of prion proteins) due to its unstable, noncrystalline and insoluble nature. This Chapter will computationally study the molecular modeling (MM) structures of this region of prions.

## 2. Molecular structures of prion AGAAAAGA amyloid fibrils

“Amyloid is characterized by a cross-β sheet quaternary structure” and “recent X-ray diffraction studies of microcrystals revealed atomistic details of core region of amyloid” (en.wikipedia.org/wiki/Amyloid and references (Nelson et al., 2005; Sawaya et al., 2007; Sunde et al., 1997; Wormell, 1954; Gilead and Gazit, 2004; Morley et al., 2006; Gazit, 2002; Pawar et al., 2005; and references therein). All the quaternary structures of amyloid cross-β spines can be reduced to the one of 8 classes of steric zippers of (Sawaya et al., 2007), with strong van der Waals (vdw) interactions between β-sheets and hydrogen bonds (HBs) to maintain the β-strands.

A new era in the structural analysis of amyloids started from the ‘steric zipper’- β-sheets (Nelson et al., 2005). As the two sheets zip up, HPs (Hydrophobic Packings) (& vdws) have been formed. The extension of the ‘steric zipper’ above and below (i.e. the β-strands) is maintained by HBs (but there is no HB between the two β-sheets). This is the common structure associated with some 20 neurodegenerative amyloid diseases, ranging from Alzheimer’s and type-II diabetes to prion diseases. For prion AGAAAAGA amyloid fibril structure, basing on the common property of potential energy minimization of HPs, vdws, and HBs, we will present computational molecular structures of prion AGAAAAGA amyloid fibrils.

### 2.1. Review on materials and methods, and results of MM models

*2.1.1. Hybrid method of steepest descent – conjugate gradient with simulated annealing*

X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements using a simulated annealing protocol in order to produce a better structure. Thus, it is very amenable to use simulated annealing (SA) to format the models constructed. Zhang (2011a, 2011d) presents a hybrid method of global search SA with local steepest descent (SD), conjugate gradient (CG) searches. The hybrid method is executed with the following three procedures. (1) Firstly the SD method and then the CG method are executed. These two local search methods are traditional optimization methods. The former has nice convergence but is slow when close to minimums. The latter is efficient but its gradient RMS and GMAX gradient (Case et al., 2010) do not have a good convergence. (2) When models cannot be optimized further, we employ standard SA global search procedure. (3) Lastly, the SD and CG methods are used to refine the models. The PDB (Berman et al., 2000) templates used in (Zhang, 2011a, 2011d) are 2OKZ.pdb, 2ONW.pdb, 2OLX.pdb, 2OMQ.pdb, 2ON9.pdb, 2ONV.pdb, 2ONA.pdb, 1XYO.pdb, 2OL9.pdb, 2OMN.pdb, 2ONX.pdb, 2OMP.pdb, 1YJP.pdb of (Sawaya et al., 2007), but only the 2OMP and 1YJP template-based three MM-Models (Fig. 6a~6c in (Zhang, 2011a)) are successfully passed through the SDCG-SA-SDCG computational procedures.

*2.1.2. Hybrid method of discrete gradient with simulated annealing*

Zhang et al. (2011a, 2011d) used 3FVA.pdb as the pdb template to build two MM-Models (Figs. 11~12 in (Zhang et al., 2011a)). The Models were built using a hybrid SA Discrete Gradient (DG (Bagirov et al., 2008)) method. Then the Models were optimized using SDCG-SA-SDCG methods as in (Zhang, 2011a).

*2.1.3. Computational method of canonical dual global optimization theory*

Zhang et al. (2011, 2011d) used 3NHC.pdb, 3NVF/G/H/E.pdb templates to build several MM-Models (Figs. 9~11 in (Zhang et al., 2011b), and Figs. 5~8 in (Zhang, 2011b)). These Models were built in the use of canonical dual global optimization theory (Gao et al., 2012; Gao and Wu, 2012; Gao, 2000) and then refined by SDCG-SA-SDCG methods as in (Zhang, 2011a).

### 2.2. New material and method, and new MM-models

*2.2.1. New material*

This Chapter uses a suitable pdb file template 3NHD.pdb (the GYVLGS segment 127-132 from human prion with V129 (Apostol et al., 2010) from the Protein Data Bank to build MM-models of AGAAAAGA amyloid fibrils for prions.

*2.2.2. New computational method - computational method of simulated annealing evolutionary computations*

The computational methods used to build the new MM-Models will be simulated annealing evolutionary computations (SAECs), where SAECs were got from the hybrid algorithms of (Abbass et al., 2003) by simply replacing the DG method by the SA algorithm of (Bagirov and Zhang, 2003) and numerical computational results show that SAECs can successfully pass the test of more than 40 well-known benchmark global optimization problems (Zhang, 2011c).

*2.2.3. New MM-models*

The atomic-resolution X-ray structure of 3NHD.pdb is a steric zipper, with strong vdw interactions between β-sheets and HBs to maintain the β-strands (Fig. 2).

By observations of the 3rd column of coordinates of 3NHD.pdb and Fig. 2, G(H) chains (i.e. β-sheet 2) of 3NHD.pdb can be calculated from A(B) chains (i.e. β-sheet 1) by Eq. 1 (where T is Transpose of column vector) and other chains can be calculated by Eqs. 2~3:

Basing on the template 3NHD.pdb from the Protein Data Bank, three prion AGAAAAGA palindrome amyloid fibril models - an AGAAAA model (Model 1), a GAAAAG model (Model 2), and an AAAAGA model (Model 3) - will be successfully constructed in this Chapter. Because the template is a segment of 6 residues, the three shorter prion fragments are selected. This Chapter does not perform calculations on the full AGAAAAGA. Chains AB of Models 1~3 were respectively got from AB chains of 3NHD.pdb using the mutate module of the free package Swiss-PdbViewer (SPDBV Version 4.01) (http://spdbv.vital-it.ch). It is pleasant to see that almost all the hydrogen bonds are still kept after the mutations; thus we just need to consider the vdw contacts only. Making mutations for GH chains of 3NHD.pdb, we can get the GH chains of Models 1~3. However, the vdw contacts between Chain A and Chain G, between B chain and H chain are too far at this moment (Figs. 3~5).

Seeing Figs. 3~5, we may know that for Model 1 at least 3 vdw interactions B6.ALA.CB-H3.ALA.CB-B4.ALA.CB-H5.ALA.CB should be maintained (their distances in Fig. 3 are 7.82, 8.36, 9.04 angstroms respectively), for Model 2 at least 3 vdw interactions G4.ALA.CB-

A3.ALA.CB-G2.ALA.CB-A5.ALA.CB should be maintained (their distances in Fig. 4 are 7.16, 7.43, 9.31 angstroms respectively), and for Model 3 at least 3 vdw interactions A1.ALA.CB-G4.ALA.CB-A3.ALA.CB-G2.ALA.CB should be maintained (their distances in Fig. 5 are 3.45, 7.16, 7.43 angstroms respectively). For Model 1, fixing the coordinates of B6.ALA.CB and B4.ALA.CB, letting the coordinates of H3.ALA.CB and H5.ALA.CB be variables, we may get a simple Lennard-Jones (LJ) potential energy minimization problem just with 6 variables (see Eq. 9). Similarly, for Model 2 fixing the coordinates of A3.ALA.CB and A5.ALA.CB, letting the coordinates of G4.ALA.CB and G2.ALA.CB be variables, we may get a simple LJ potential energy minimization problem just with 6 variables (see Eq. 10); for Model 3, fixing the coordinates of A1.ALA.CB and A3.ALA.CB, letting the coordinates of G4.ALA.CB and G2.ALA.CB be variables, we may get a simple LJ potential energy minimization problem with 6 variables (see Eq. 11).

The vdw contacts of atoms are described by the LJ potential energy:

where ε is the depth of the potential well and σ is the atom diameter; these parameters can be fitted to reproduce experimental data or deduced from results of accurate quantum chemistry calculations. The (σ/r)^{12} term describes repulsion and the -(σ/r)^{6} term describes attraction. If we introduce the coordinates of the atoms whose number is denoted by N and let ε=σ= 1 be the reduced units, the Eq. 4 becomes into

where t_{ij} = (x_{3i−2} – x_{3j−2})^{2} + (x_{3i−1} – x_{3j−1})^{2} + (x_{3i} – x_{3j})^{2}, (x_{3i−2}, x_{3i−1}, x_{3i}) is the coordinates of atom i, N≥2. The minimization of LJ potential f(x) on R^{n} (where n = 3N) is an optimization problem:

Similarly as Eq. 4, i.e. the potential energy for the vdw interactions between β-sheets:

the potential energy for the HBs between the β-strands has the formula

where A, B, C, D are given constants. Thus, the amyloid fibril molecular modeling problem can be deduced into the problem to solve the mathematical optimization problem Eq. 6. Seeing Fig. 6, we may know that the optimization problem Eq. 6 reaches its optimal value at the bottom of the LJ potential well, where the distance between two atoms equals to the sum of vdw radii of the atoms. In this Chapter, the sum of the

vdw radii is the twice of the vdw radius of Carbon atom, i.e. 3.4 angstroms. The optimization problem Eq. 6 is a nonconvex complex optimization problem. By the observation from Fig. 6, we may solve its simple but equal convex-and-smooth least square optimization problem (or the so-called distance geometry problem or sensor network problem) with a slight perturbation if data for three atoms violate the triangle inequality. The following three optimization problems for Models 1~3 respectively are:

min f(x)= ½{ (x | () |

min f(x)= ½{ (x | () |

min f(x)= ½{ (x | () |

We may use any optimization algorithms or packages to easily solve problems Eqs. 9~11 and get their respective global optimal solutions (-13.062, 9.126, -3.336; -12.344, 6.695, -2.457), (-11.275, 6.606, 3.288; -5.461, 7.124, 2.424), (-12.149, 8.924, 1.229; -9.256, 11.007, 3.517), which were got by the SAEC algorithms in this Chapter. Input these global optimal solutions into Eq. 1, take average and tests then we get Eq. 12:

By Eq. 12, we can get close vdw contacts in Figs. 7~9.

From Figs. 3~5 to Figs. 7~9, we may see that the Optimization algorithm works and the computational experiences show us we had better at least define two sensors and two anchors in order to form a zipper between the two β-sheets. Next, in order to remove very close bad contacts, we relax Figs. 7~9 by a slight SDCG-Optimization in the use of Amber 11 (Case et al., 2010) and we get the optimized MM-Models 1~3. The other CDEF and LKJI chains can be got by parallelizing ABGH chains in the use of mathematical Eqs. 2~3. The new amyloid fibril models are useful for the drive to find treatments for prion diseases in the field of medicinal chemistry. The computational algorithms presented in this Chapter and their references therein are useful in materials science, drug design, etc.

Because Eqs. 9~11 are optimization problems with 6 variables only and these optimization problems are to minimize fourth-order polynomials, the proposed SAEC method and other computational methods can easily get the same optimal solutions to optimize the above three models.

*2.2.4. The practical LBFGS quasi-Newtonian method*

Energy minimization (EM), with the images at the endpoints fixed in space, of the total system energy provides a minimum energy path. EM can be done using SD, CG, and LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno). SD is robust and easy to implement but it is not most efficient especially when closer to minimum. CG is slower than SD in the early stages but more efficient when closer to minimum. The hybrid of SD-CG will make SD more efficient than SD or CG alone. However, CG cannot be used to find the minimization energy path, for example, when “forces are truncated according to the tangent direction, making it impossible to define a Lagrangian” (Chu et al., 2003). In this case, the powerful and faster quasi-Newtonian method (e.g. the LBFGS quasi-Newtonian minimizer) can be used (Chu et al., 2003; Liu and Nocedal, 1989; Nocedal and Morales, 2000; Byrd et al., 1995; Zhu et al., 1997). We briefly introduce the LBFGS quasi-Newtonian method as follows.

Newton’s method in optimization explicitly calculates the Hessian matrix of the second-order derivatives of the objective function and the reverse of the Hessian matrix (Dennis et al., 1996). The convergence of this method is quadratic, so it is faster than SD or CG. In high dimensions, finding the inverse of the Hessian is very expensive. In some cases, the Hessian is a non-invertible matrix, and furthermore in some cases, the Hessian is symmetric indefinite. Qusi-Newton methods thus appear to overcome all these shortcomings.

Quasi-Newton methods (a special case of variable metric methods) are to approximate the Hessian. Currently, the most common quasi-Newton algorithms are the SR1 formula, the BHHH method, the widespread BFGS method and its limited /low-memory extension LBFGS, and Broyden's methods (http://en.wikipedia.org/wiki/Quasi-Newton_method). In Amber (Case et al., 2010) and Gromacs (van der Spoel et al., 2010), LBFGS is used, and the hybrid of LBFGS with CG - a Truncated Newton linear CG method with optional LBFGS Preconditioning (Nocedal and Morales, 2000) - is used in Amber (Case et al., 2010).

### 2.3. New thinking about the construction of 3D-structure of a protein

If a NMR or X-ray structure of a protein has not been determined and stored in PDB bank yet, we still can easily get the 3D-structural frame of the protein. For example, before 2005 when we did not know the NMR structure of rabbit prion protein, we could get its homology model structure using the NMR structure of the human prion protein (PDB id: 1QLX) as the template (Zhang et al., 2006). We may use the homology structure to determine the 3D-structural frame of a protein when its NMR or X-ray structure has not been determined yet. The determination is an optimization problem described as follows.

“Very often in a structural analysis, we want to approximate a secondary structural element with a single straight line” (Burkowski, 2009: page 212). For example, Fig. 10 uses two straight lines that act as the longitudinal axis of β-Strand A (i.e. A chain), β-Strand B (i.e. B chain) respectively. Each straight line should be positioned among the Cα atoms so that it is closest to all these Cα atoms in a least-squares sense, which is to minimize the sum of the squares of the perpendicular

distances (*d*_{i}) from the Cα atoms to the strand/helix axis:

Define the vector *w*=(*w*_{x}, *w*_{y}, *w*_{z})^{T} for the axis. Then *d*_{i} represents the perpendicular vector going from Cα atom *a* to the axis:

|| | () |

According to Eqs. 13~14, for the β-Strand A – β-Strand B of AB chains, we get the following two optimization problems for Model 1 respectively:

min S_{A} = ( (-16.196)^{2} + 8.315^{2} + 1.061^{2} ) { 1- (-16.196*w*_{x} + 8.315*w*_{y} + 1.061*w*_{z} )^{2} /[( (-16.196)^{2} + 8.315^{2} + 1.061^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]} + ( (-12.977)^{2} + 6.460^{2} + 1.908^{2} ) { 1- (-12.977*w*_{x} + 6.460*w*_{y} + 1.908*w*_{z} )^{2} /[( (-12.977)^{2} + 6.460^{2} + 1.908^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]} + ( ( -9.178)^{2} + 6.745^{2} + 1.448^{2} ) { 1- ( -9.178*w*_{x} + 6.745*w*_{y} + 1.448*w*_{z} )^{2} /[( (- 9.178)^{2} + 6.745^{2} + 1.448^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]} + ( ( -6.455)^{2} + 4.112^{2} + 1.558^{2} ) { 1- ( -6.455*w*_{x} + 4.112*w*_{y} + 1.558*w*_{z} )^{2} /[( (-6.455)^{2} + 4.112^{2} + 1.558^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]} + ( ( -3.006)^{2} + 5.750^{2} + 1.782^{2} ) { 1- ( -3.006*w*_{x} + 5.750*w*_{y} + 1.782*w*_{z} )^{2} /[( (-3.006)^{2} + 5.750^{2} + 1.782^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]} + ( ( -1.226)^{2} + 2.750^{2} + 0.233^{2} ) { 1- ( -1.226*w*_{x} + 2.750*w*_{y} + 0.233*w*_{z} )^{2} /[( (-1.226)^{2} + 2.750^{2} + 0.233^{2} ) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2})]},

min S | () |

We solve Eqs. 15~16 (taking the average of the coordinates of Cα atoms as initial solutions), getting their optimal solutions w1 = (-10.751, 6.428, 1.411)^{T}, w2 = (-7.960, 4.579, -2.256)^{T} respectively (Fig. 10). We may use the vectors w1, w2 and Eq. 12 to construct Chains GH and then build an optimal Model 1 (Aqvist, 1986; Abagyan and Maiorov, 1988; Orengo et al., 1992; Young et al., 1999; Foote and Raman, 2000). In (Burkowski, 2009: pages 213-216), *w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2} =1 (i.e. *w* is a unit vector) is restrained and Eq. 13 becomes into a problem to seek the smallest eigenvalue (S*) and its corresponding eigenvector w of the following matrix:

((∑_{i=1}^{N}(a_{y}^{(i)})^{2} + (a_{z}^{(i)})^{2}, -∑_{i=1}^{N}a_{x}^{(i)}a_{y}^{(i)}, -∑_{i=1}^{N}a_{z}^{(i)}a_{x}^{(i)})^{T}, (-∑_{i=1}^{N}a_{x}^{(i)}a_{y}^{(i)}, ∑_{i=1}^{N}(a_{z}^{(i)})^{2} + (a_{x}^{(i)})^{2}, -∑_{i=1}^{N}a_{y}^{(i)}a_{z}^{(i)})^{T}, (-∑_{i=1}^{N}a_{z}^{(i)}a_{x}^{(i)}, -∑_{i=1}^{N}a_{y}^{(i)}a_{z}^{(i)}, ∑_{i=1}^{N}(a_{x}^{(i)})^{2} + (a_{y}^{(i)})^{2})^{T}).

This matrix is symmetric and positive definite, and its eigenvectors form an orthogonal basis for the set of atoms under consideration. In physics, it is called the inertial tensor involving studies of rotational inertia and its eigenvectors are called the principle axes of inertia. Furthermore, we may also notice that Eq. 13 can be rewritten as

where ||*d*_{i}||^{2} = (*a*^{(i)}_{x}
^{2} + *a*^{(i)}_{y}
^{2} + *a*^{(i)}_{z}
^{2}) (*w*_{x}
^{2} + *w*_{y}
^{2} + *w*_{z}
^{2}) – ( *a*^{(i)}_{x}
*w*_{x} + *a*^{(i)}_{y}
*w*_{y} + *a*^{(i)}_{z}
*w*_{z} )^{2.} Thus, Eq. 17 can be easily solved by the canonical dual global optimization theory (Gao et al., 2012; Gao and Wu, 2012; Gao, 2000), by the ways of solving the canonical dual of Eq. 17 or solving the quadratic differential equations of the prime-dual Gao-Strang complementary function (Gao et al., 2012; Gao and Wu, 2012; Gao, 2000) through some ordinary or partial differential equation computational strategies.

## 3. Conclusions

To date the hydrophobic region AGAAAAGA palindrome (113-120) of the unstructured N-terminal region (1-123) of prions has little existing experimental structural data available. This Chapter successfully constructs three molecular structure models for AGAAAAGA palindrome (113-120) by using some suitable template 3NHD.pdb from Protein Data Bank and refinement of the Models with several optimization techniques within AMBER 11. These models should be very helpful for the experimental studies of the hydrophobic region AGAAAAGA palindrome of prion proteins (113-120) when the NMR or X-ray molecular structure of prion AGAAAAGA peptide has not been easily determined yet. These constructed Models for amyloid fibrils may be useful for the goals of medicinal chemistry.

This Chapter also introduces numerous practical computational approaches to construct the molecular models when it is difficult to obtain atomic-resolution structures of proteins with traditional experimental methods of X-ray and NMR etc, due to the unstable, noncrystalline and insoluble nature of these proteins. Known structures can be perfectly reproduced by these computational methods, which can be compared with contemporary methods. As we all know, X-ray crystallography finds the X-ray final structure of a protein, which usually need refinements using a SA protocol in order to produce a better structure. SA is a global search procedure and usually it is better to hybrid with local search procedures. Thus, the computational methods introduced in this Chapter should be better than SA along to refine X-ray final structures.