Molecular dynamics (MD) and normal mode analysis (NMA) are very useful methods for characterizing various dynamic aspects of biological macromolecules. In comparison to MD, NMA is computationally less expensive which facilitates the quick and systematic investigation of protein flexibility and dynamics even for large proteins and protein complexes, whose structure was obtained experimentally or in silico. In particular, NMA can be used to describe the flexible states adopted by a protein around an equilibrium position. These states have been repeatedly shown to have biological relevance and functional significance. This chapter briefly characterizes NMA and describes the elastic network model, a schematic model of protein shape used to decrease the computational cost of this method. Finally, we will describe the applications of this technique to several large proteins and their complexes as well as its use in enhancing protein homology modeling.
- normal mode analysis
- elastic network model
- crystal structure
- protein dynamics
- homology modeling
Often there is a high demand for the structures of biologically important proteins, especially those which are large or part of complex systems. However, it is not always possible, for many reasons, to get a high-resolution structure experimentally using X-ray crystallography, NMR, or cryo-electron microscopy. Among the many problems, we can mention low protein expression, low protein stability, high aggregation or poorly diffracting crystals . In this situation,
2. Normal mode analysis
Normal mode analysis is a technique, based on the physics of small oscillations, that can be used to describe the flexible states accessible to a protein around an equilibrium position. The idea is that when an oscillating system at equilibrium, for example a protein in an energy minimum conformation, is slightly perturbed, a restoring force acts to bring the perturbed system back to its equilibrium conformation. A system is defined to be in equilibrium or at the bottom of a potential minimum when the generalized forces acting on it are equal to zero. At the minimum energy conformation, represented by the generalized coordinates , the potential energy equation can be written as a power series in :
where and are the instantaneous configuration of components and and the deviation of component from its equilibrium configuration is given by . is the potential energy equation of the system and, for proteins, usually takes the form of one of the commonly used molecular dynamics force fields . The first term in the series represents the minimum value of the potential and may be set to zero and the second term will be zero at any local minimum, so the potential can be written as
where is the Hessian matrix which contains the second derivatives of the potential with respect to the components of the system.
It is also necessary to consider the kinetic energy () of the system since we are interested in dynamics. For component , this can be given by
where is a diagonal matrix containing the mass of each particle. The entire equation of motion can be written as
One solution of this equation is the oscillatory equation
where is the amplitude of oscillation, is the frequency, and is a phase factor. By substituting this into Eq. (4), the equation of motion can be rewritten as
where the matrix contains the eigenvectors of the Hessian matrix and is a diagonal matrix containing the eigenvalues. The eigenvectors are the normal mode vectors and describe in which direction and how far each particle in the system moves with respect to each other particle; the eigenvalues give the squares of the frequencies with which all the particles involved with a particular mode vibrate. While the eigenvectors can tell in which direction and how far each particle moves with respect to the others, it does not give absolute displacements. NMA alone therefore cannot normally be used to get the displacement amplitudes of a given normal mode .
The vibrational energy of the system is generally equally divided so that every vibrational mode has the same energy and the average oscillation amplitude of a given mode scales as the inverse of its frequency. Thus, modes with higher frequencies, which will have energetically greater displacements, typically describe rapid but small amplitude local motions involving relatively few atoms, while those with lower frequencies will describe slower displacements involving larger numbers of atoms and describe large-scale conformational changes. As the name of the method indicates, these vibrational modes are normal to one another, meaning that they move independently: the excitation of one mode does not trigger the motion of a second one and the general motion of the system can be described by a superposition of all the modes. These normal modes yield analytical solutions to the equations of motion: for a given set of initial positions and velocities, NMA allows us to calculate where each atom of the system in question will be at any subsequent time subject to the small oscillation approximation. (A more complete treatment of the theory behind NMA and its advantages and limitations may be found in ).
NMA was first applied to peptides in 1979  and was subsequently used to study the whole proteins bovine pancreatic trypsin inhibitor (BPTI) [14, 15], hexokinase , crambin , human lysozyme [17, 18], ribonuclease , and myoglobin [19, 20]. Application of the method to larger systems was hampered by its computational expense. With advances in computer technology and the development of more efficient algorithms, it has become possible to examine larger structures, including the skeletal ryanodine receptor , Ca-ATPase , GroEL [23, 24, 25], the ribosome [26, 27, 28], the yeast nuclear pore complex , and virus capsids [30, 31, 32]. All these will be examined in more detail in Section 3 below.
2.1 The elastic network model
NMA is less computationally expensive than molecular dynamics simulation, but it is still not trivial for proteins containing many thousands of atoms. The first problem is that the structure to be studied must be energy minimized to ensure that the starting conformation is in a true minimum relative to the chosen force field. The minimization must then proceed until machine precision is reached, typically below 0.001 kJ/mol-nm, which is much more computationally demanding than the minimizations normally employed for other tasks. Frequently, the results of this process distort the structure, leading to NMA being carried out on a structure different from the experimentally determined one. The second problem, and the computationally limiting factor, is the diagonalization of the Hessian matrix. For classical, all-atom NMA, all atoms in a structure, including the hydrogen atoms, must be used, making the total Hessian in size. For large proteins with thousands of atoms this can become computationally difficult very quickly. Consequently, a number of coarse-grained approximate methods have been developed to overcome both of these limitations [6, 11]. The most common and widely used of these is the elastic network model (ENM).
The general idea of ENMs, first put forward by Tirion in 1996 , is to replace the complicated semi-empirical force fields used in standard NMA with a simple harmonic potential:
where is the distance between atoms or nodes and , is the distance in the initial structure, and is a spring constant assumed to be the same for each –pair. It should be noted that by design the input configuration is assumed to be a minimum energy one, and energy minimization against a potential is therefore unnecessary. in this equation refers to a cut-off radius and the sum is only over all pairs less than this value. is somewhat arbitrary, but in practice, values of between 7.0–8.0 Å are used based on the observed distances between non-bonded atoms in experimental structures [34, 35]. Most frequently, only the Catoms are used for these calculations because they are sufficient for studying the backbone motions of the protein and are all that is necessary for characterizing the lowest-frequency normal modes.
A number of different ENM formulations have been developed. The simplest one is the Gaussian network model (GNM) developed by Bahar and co-workers . The GNM replaces the Hessian matrix with an Kirchoff matrix (). is defined in terms of spring constants , which are created based on the assumption that the separation distance between the th and th Catoms in the protein follows a Gaussian distribution. The potential is given by
where is a vector expressing the fluctuations in distance between the th and th Catoms. The model assumes that these fluctuations are isotropic; consequently, no information about the three-dimensional directions of motion can be obtained. Eigenvalue decomposition of does allow the contribution of individual modes to the equilibrium dynamics to be calculated, as well as the relative displacement of residues along each mode axis, the cross-correlation between the residues in the individual modes, and square displacement profiles.
Some form of the anisotropic network model (ANM) is perhaps the most commonly encountered ENM. This is the form originally suggested by Tirion  and incorporated into the Molecular Modeling Toolkit (MMTK) by Hinsen , and widely used in a number of other tools. The ANM gives the same information as the GNM, but also provides information on the directionality of the fluctuations. On the other hand, the mean-square fluctuations (-factors) and cross-correlations it produces do not agree quite as well with experiment as GNM [38, 39]. In compensation, ANMs can be used to generate alternative conformations in the close neighborhood of the starting structure by deforming the structure along the lowest frequency modes . Two groups led by Zheng  and Lin and Song  developed models which combined the best features of GNM and ANM into a single method.
While coarse-graining does allow ENMs to be scaled to very large models, it does so by losing detailed information on local structural movements. The rotating-translating blocks (RTB) model of Sanejouand  was constructed to alleviate this. In this approach, the protein or other macromolecule is divided into blocks made up of one or a few residues connected by elastic springs. Next, it is assumed that a good approximation to the low-frequency normal modes can be made by forming linear combinations of the local rotations and translations of these individual blocks. Consequently, a projection matrix is constructed and used to build a projected Hessian matrix , which is diagonalized with , where is the eigenvector matrix diagonalizing and is the corresponding eigenvalue matrix. The resulting eigenvectors can be projected back into the full -dimensional space using , where is a matrix containing the lowest-frequency approximate normal modes.
In a survey of the recent structural biology literature, Bauer
3.1 Application to experimental structures
3.1.1 Ryanodine receptor
The ryanodine receptors (RyRs) are the largest presently known ion channels, with molecular weights of 2.2 MDa. They are homotetramers embedded in the membrane of the sarcoplasmic reticulum of myocytes, where they play a key role in excitation-contraction coupling. They regulate Carelease from the sarcoplasmic reticulum by undergoing a closed-to-open gating transition in response to an action potential or calcium binding. RyRs are found in all animals. Three isoforms have been identified in mammals: RyR1 (predominantly expressed in skeletal muscle), RyR2 (cardiac muscle) and RyR3 (present in several tissues including the brain, diaphragm, and testes) [45, 46, 47]. RyR malfunction leads to severe muscular disorders, including malignant hyperthermia, central core disease, tachycardia, dysplasia, and others . The first high-resolution cryo-EM structures of the complete rabbit skeletal RyR were reported in 2015 [48, 49, 50], and were followed by a number of other high-resolution structures of the skeletal and cardiac isoforms in both their open and closed conformations, either alone or bound to regulators (For review see Bauerová-Hlinková
A number of MD simulation studies have been reported for both the skeletal and cardiac RyR isoforms, but only covering small parts of the whole channel; in particular, the N-terminal domain (roughly the first 600 amino acids) [52, 53, 54, 55] and parts of the channel domain [56, 57, 58] were studied in this way. Generally, these studies focused on identifying how known disease-causing mutations affected the dynamics of these fragments.
Shortly after the appearance of the first high-resolution cryo-EM structures, Wenjun Zheng published a normal-mode analysis on the 3.8-Å RyR1 structure . He found that the largest collective motions of the closed form involved large outward and downward movements of the peripheral domains, in in good agreement with the conformational variations observed in multiple cryo-EM structures of the closed form later reported by des Georges
The Ca-ATPase may be thought of as the partner of the RyR. While RyR releases Cafrom the sarcoplasmic reticulum into the cytosol, the Ca-ATPase pumps it back in, against a large concentration gradient, at the rate of two Caions per hydrolyzed ATP. Conventionally, the Ca-ATPase is thought to take two different states: E1, which has high affinity for Ca, and E2, which has much lower affinity [60, 61]. In addition to the binding and dissociation of Ca, ATP hydrolysis and dephosphorylation of the resulting phosphorylated Asp351 result in the presence of 4–7 different physiological states . At least 54 crystal structures have been determined since the first one in 2000  and they cover nearly all the conformations of the different physiological states. They show large conformational rearrangements during the reaction cycle and have been investigated by a number of computational methods . Most of the NMA studies occurred soon after the initial structures by Toyoshima
The initial structure  showed that the ATPase consisted of three cytoplasmic domains, labeled A (activation), N (nucleotide binding), and P (phosphorylation), and 10 transmembrane helices (M1–M10). As the names suggest, the cytoplasmic domain N holds most of the ATP binding site and P holds the phosphorylation site. Subsequent structures [65, 66] showed that the A domain also participates in ATP binding and has a crucial role in dephosphorylation. In the first structure, taken to be in the E1 conformation, the N and A domains were widely separated. Subsequent E2 structures showed that ATP binding induces a large movement in the N domain and a smaller, though still considerable, motion in the A domain. NMA on only the initial E1 structure  found that the N domain seemed to be the most mobile and that rotational hinges were present between the N and P, and A and M domains. It was also suggested that the transmembrane -helices M2, M4, and M8 likely played an important role in Carelease into the sarcoplasmic reticulum lumen. After the second structure in the E2 conformation became available , two studies examined the normal modes that participate in the E1 E2 conformational change [22, 68]. They found that only a few of the lower-frequency modes were needed to describe the E1 E2 transition, which predominantly involved the movement of the A and N domains to close the cytoplasmic headpiece. They also both predicted that the transmembrane domain was likely to undergo a twisting motion which would eliminate the Cabinding sites and open up the channel. The results of these studies were partially confirmed by the subsequent structures [65, 66]. The first NMA study  did correctly predict that the largest conformational changes would be observed in the positions of the N and A domains, and the movement of helices M2, M4, and M8 was important for Carelease, but they all missed that transmembrane helix M5 underwent a sharp bend following ADP release. They also failed to note that a conserved loop from domain A which interacts with the phosphate binding site shifts conformation and plays an important role in hydrolyzing the phosphate free from Asp351. Thus, these coarse-grained NMA studies managed to successfully predict many of the large-scale movements, but missed an important local conformational change and a large distortion of an element of secondary structure.
The structural transitions between these forms have been well described [70, 71, 72, 73, 74, 75], and NMA has been applied to study the dynamics of both the individual subunits as well as the entire GroEL–GroES complex [5, 23, 24, 25, 76, 77, 78]. The earliest studies [23, 24] examined individual GroEL subunits and found that there is a very close relationship between their flexibility and the conformational changes observed for the entire complex. When ATP binds to a given subunit, the subunit changes conformation closely following a few low-frequency normal modes .
NMA was also calculated on an ENM of the whole GroEL–GroES complex . These authors found that the slowest normal modes revealed a wide variety of motions which depended on the central cavity of the structure. These included the opposite twisting of the two GroEL rings combined with a flattening and expansion of the GroES cap; bending, shear, opposed radial breathing of the two rings, and stretching and contraction along the complex’s long axis were also observed. They concluded that the mechanical motions driven by the different modes provide changing binding surfaces and differently sized cavities in the interior which might enable differently shaped substrates to be accommodated; possibly, these shifts might also be used during the refolding process.
3.1.4 The Ribosome
The ribosome is a molecular machine for translating the nucleotide sequence encoded in an mRNA transcript into a polypeptide sequence that folds into a functional protein. In prokaryotes, it is composed of a 50S subunit (containing a 23S rRNA, a 5S rRNA, and 34 proteins) and a 30S subunit (comprised of a 16S rRNA and 21 proteins) which together combine to form a 70S ribosome. The 30S subunit binds the mRNA and the anticodon end of the bound tRNA and is responsible for mRNA decoding. The 50S subunit interacts with the ends of the tRNA bound to the transferred amino acid and catalyzes peptide bond formation. The complete assembly features three sites for tRNA placement: an A (aminoacyl) site, a P (peptidyl) site, and an E (exit) site.
Several different crystal structures of ribosomes from different organisms and in different states have been determined by both X-ray crystallography and cryo-EM. The earliest were the 30S subunit from
Many other complex motions have been observed, especially at the mRNA decoding center in the 30S subunit . In particular, the mRNA A site was found to be more flexible than the P site, which was consistent with the experimentally observed -factors. It also agrees with the observation that the A site is able to accommodate a diverse set of substrates and that the P site needs to be more rigid to ensure the fidelity of the codon–anticodon matching. The collective dynamics of the exit tunnel for the growing polypeptide was also examined . Here, it was found that the tunnel could be generally divided into three regions, entrance, neck, and exit, based on the low-frequency motions of the tunnel lining. Generally, the middle parts of the tunnel move in a complex way toward the tunnel exit, while the parts near the exit itself rotate around the tunnel axis. NMA of ENMs of the tRNAs were also examined and shown to be similar to the range of conformations observed from multiple experimental tRNA structures . By comparing the normal modes of tRNAs alone and bound to the ribosome, they also noted that the ribosome acts to suppress all internal tRNA motions, only allowing it to move by rigid-body translation .
3.1.5 Yeast nuclear pore complex
The nuclear pore complex (NPC) is an enormous macroassembly that regulates the import and export of a large variety of substances (including proteins, nucleic acids, and small molecules) from the nucleus . It is an octagonal complex composed of some 30 different proteins called nucleoporins. The yeast NPC has a mass of around 60 MDa while the vertebrate one is around 125 MDa. The yeast NPC is a ring that is around 100 nm across with a central pore of about 30 nm. The eight different subunits are termed “spokes” and each spoke exhibits pseudo 2-fold symmetry, giving the complex as a whole pseudo 16-fold symmetry. With 16 copies each, the 30 different nucleoporins make a total complex of 450–480 proteins. The central channel is coated with several “FG nucleoporins,” which contain many structurally disordered phenylalanine (F) and glycine (G) repeats. These FG nucleoporins form a selective barrier: small particles (30 kDa) can diffuse freely through the pore, while larger proteins require the assistance of karyopherin transport factors. The vertebrate NPC possesses, in addition to a central core, additional structural elements, including a cytoplasmic ring, a nuclear ring, and a luminal ring .
By following an integrative approach that combined data from many different sources, a coarse-grained structural model of the yeast NPC was developed [92, 93]. This model was then used for a coarse-grained NMA using an ENM . This study found that two types of collective modes were predicted to be favored: a global bending mode and an extension and contraction mode that oscillated between a circle and an ellipse. Several different coarse-grained representations were tried and it was found that a simple model of a torid with an axial varying mass density was sufficient to capture the main dynamic features. It was also found that the number of spokes significantly affected the number of symmetric low-frequency modes: torids composed of eight spokes had access to symmetric modes that torids composed of seven or nine modes did not. A similar NMA study created using a different coarse-grained toridal model came to similar conclusions .
3.1.6 Virus Capsids
Aside from the ryanodine receptor and the nuclear pore complex, all of the forgoing examples are molecular machines of some sort that transform the chemical energy of ATP hydrolysis into mechanical motion. Viral capsids are different in that they are more of an architectural than a mechanical structure. Viral capsids encapsulate and protect the viral genome during its spread from cell to cell during infection. They can display a number of different surface features, including pores, canyons, spikes, and pillars. They typically consist of more than 100 protein subunits and have diameters of 100 nm or more. Capsids with more than 60 subunits that maintain their icosahedral symmetry are typically made up of collections of pentamers and hexamers . The most commonly studied viral capsids form a spherical or icosahedral shape. NMA has been used to study the mechanical properties and conformational changes of these capsids for nearly 20 years (reviewed by May ). Even using coarse-grained NMA methods, the computational loads required to study whole capsids are still non-trivial, so a variety of symmetry-based approaches have been created to reduce them [32, 97, 98, 99, 100, 101, 102, 103].
One of the most thoroughly studied viral capsids is that of bacteriophage HK97 [31, 99, 104]. Tama and Brooks [31, 105] used NMA to study the swelling of a number of viral capsids, including cowpea chlorotic mottle virus (CCMV), Nudaurelia capensis virus (NV), and HK97. They found that for CCMV and NV, only the single lowest-frequency mode was needed to describe the swelling, while HK97 required the two lowest-frequency modes. The reason for this appeared to be that the CCMV and NV, being spherical, required only a single mode, while HK97 not only expands, but also changes conformation from spherical to icosahedral. Generally, these studies, together with two others [106, 107], showed that the functional dynamics of the capsids are dictated by their structure, which is why only the lowest-frequency modes are needed. The HK97 expansion and conformational transition occurs during its maturation, and NMA has also been applied to study the HK97 maturation pathway [99, 104]. During maturation, a spherical procapsid undergoes substantial expansion together with a conformational change from spherical to icosahedral to form a mature capsid. These studies found that only a few low-frequency icosahedral modes were needed to account for most of the maturation expansion and that maturation appears to occur through the puckering of the pentamers followed by the flattening and cross-linking of the hexamers.
3.2 NMA in homology modeling studies
3.2.1 Membrane proteins
One of the protein groups in which homology modeling has been used together with NMA is multidomain transmembrane proteins, in particular the human nicotine acetylcholine receptor (hnAChR)  and
Similar approaches were use in both studies. First a homology model of the target protein was constructed based on the known structures of similar proteins, followed by the application of NMA. The aim of both studies was to better understand channel gating and the particular structural changes associated with it [108, 109]. The hnAChR study also involved predicting the ligand binding site .
Enzymes have also been studied using a combination of HM and NMA. One example is the
3.2.3 Cell division and transport proteins
One of the longest-studied cellular processes is cell division  and cellular transport . Regarding cell division, a combination of HM and NMA was used to study a yeast cohesin, an essential ring-shaped chromosome maintenance protein that mediates sister chromatid cohesion, homologous recombination, and DNA looping. This protein is a member of the structural maintenance of chromosomes (Smc) family, which exists in all eukaryotes . In yeast, cohesin mainly consists of two Smc proteins, Smc1 and Smc3, both of which adopt long, anti-parallel coiled-coil regions that are separated by two globular regions: an ATP-binding head domain and a hinge region. The aim of the HM and NMA study  was to reveal the missing molecular details of how the two halves of the hinge region open to create an entry gate for DNA. In agreement with experimental data, the constructed yeast cohesin HM model showed that the bending motion of the cohesin ring is able to adopt a head-to-tail conformation. At the interface of the cohesin heterodimer, low-frequency conformational changes were observed to deform the highly conserved glycine residues present there. Normal mode analysis further revealed that the docking of large globular structures, such as the nucleosome and accessory proteins, to cohesin notably affected the mobility of the coiled-coil regions. Moreover, fully solvated molecular dynamics calculations, performed specifically on the hinge region, indicated that hinge opening starts from one side of the dimerization interface and is coordinated by the highly conserved glycine residues .
Normal mode analysis is a very useful technique for determining which conformational states are accessible to a given macromolecule. It can provide much of the same information given by more computationally expensive methods, such as molecular dynamics simulation, at only a fraction of the cost. It can be used by itself or in tandem with HM to characterize the general flexibility and domain movements of a molecule, to produce possible alternative conformations and confirm observed ones, and to describe the conformational changes that occur or might occur during substrate binding, product release, or catalytic activation. We have illustrated its utility using several examples of its application to a number of large biologically important proteins and protein complexes from bacteria, eukaryotes, and viruses. The biological relevance of the
The authors would like to thank Eva Kutejová for general support during the writing of this chapter.
This research was funded by Vedecká Grantová Agentúra MŠVVaŠ SR and SAV grant number 2/0131/20.
Conflict of interest
The authors declare no conflict of interest.