High-Throughput Simulations of Protein Dynamics in Molecular Machines: The ‘Link’ Domain of RNA Polymerase

the a the the Molecular Dynamics is a two-volume compendium of the ever-growing applications of molecular dynamics simulations to solve a wider range of scientific and engineering challenges. The contents illustrate the rapid progress on molecular dynamics simulations in many fields of science and technology, such as nanotechnology, energy research, and biology, due to the advances of new dynamics theories and the extraordinary power of today's computers. This second book begins with an introduction of molecular dynamics simulations to macromolecules and then illustrates the computer experiments using molecular dynamics simulations in the studies of synthetic and biological macromolecules, plasmas, and nanomachines. Coverage of this book includes: Complex formation and dynamics of polymers Dynamics of lipid bilayers, peptides, DNA, RNA, and proteins Complex liquids and plasmas Dynamics of molecules on surfaces Nanofluidics and nanomachines


Introduction
Complex molecular machines are involved in all information-processing steps in cells. The handling of information-bearing macromolecules (predominantly nucleic acids and proteins) requires a range of catalytic capabilities, such as the sequence-specific synthesis of macromolecular entities from smaller precursors and further processing by breaking/joining of pre-existing molecular components. These catalytic activities have to be spatially and temporally precisely controlled to ensure accuracy and efficiency levels that are commensurate with the associated biological functions (Lyubimov et al., 2011). Unlike metabolic enzymes, which are often medium-sized enzymes with freely accessible active sites, the enzymes associated with replication, transcription, translation, recombination are typically large multi-subunit protein complexes capable of a complex conformational spectrum. This conformational spectrum manifests itself through the dynamic features of individual domains within these molecular machines: many of the domains act explicitly as nanomechanical elements by serving as allosteric sensors, molecular hinges and motor units (Bustamante et al., 2011;Heindl et al., 2011). A deeper understanding of the functions of molecular machines is currently a high-priority topic and ultimate success will strongly depend on a combination of 'wet-lab' experimental techniques (X-ray crystallography, NMR, spectroscopic methods, site-directed mutagenesis and chemical cross-linking) with sophisticated computational simulations. Fully atomistic molecular dynamics (MD) simulations offer the most comprehensive and detailed insights into the behavior of molecular entities and are therefore the method of choice for attempts to unravel the structural basis of molecular mechanisms (reviewed in Karplus & McCammon, 2002;Freddolino et al., 2010;McGeagh et al. 2011;Schlick et al., 2011).
The ultimate goal of MD studies is the complete and comprehensive simulation of molecular machines operating within a relevant time span. Since the functions of molecular machines critically depend on a series of tightly orchestrated conformational changes that typically occur in the micro-and milliseconds range, such a goal is not yet routinely achievable. Currently available computational resources are still insufficient for simulating the molecular dynamics of macromolecular entities for the length required for complex conformational changes to take place. We are therefore focusing on the more practicable goal of studying the dynamic properties of individual components of molecular machines in order to obtain a better understanding of their nanomechanical properties. In any natural or man-made machine, the mechanical properties of individual components determine both their functional capabilities, as well as their interactions with other parts. In many cases a description of these properties allows either a clearcut deduction of the role of such components within the overall mechanism, or offers at least valuable suggestions for further experimental approaches. Most importantly, once the nanomechanical properties of individual components are understood in sufficient detail, their mode of interactions within defined assemblies can be modelled with increased accuracy and confidence. Such approaches result in insights into complex molecular mechanisms generated from the coordinated interplay between multiple protein domains.
Efforts in my laboratory are predominantly focused on understanding the intricate mechanisms of an enzyme playing a key role in gene expression: RNA polymerase (RNAP). Cellular RNAPs are highly conserved in evolution (ranging from bacteria to archaea and eukaryotes) and consist of a number of different subunits (reviewed in Werner, 2008). The large size and intrinsic complexity of these enzymes presents a particular challenge to understand the various mechanisms involved, such as separation and re-annealing of the DNA strands and the template-directed synthesis of an RNA transcript. Although structures of RNAPs in various states/conformations have been determined (e.g. Zhang et al., 1999;Cramer et al., 2001;Gnatt et al., 2001;Vassylyev, 2002), we are currently still far from having a complete catalogue of the most important conformations. Maybe even more importantly, the intermediate structural transitions between stable structural states are likely to be kinetically short-lived and are therefore very unlikely to be captured in the crystallized state required for high-resolution structural studies (Kaplan & Kornberg, 2008). Having a detailed understanding of such short-lived and metastable intermediate structures will, however, be critical for piecing together the conformational changes required for the various biological activities of RNAPs. Within the context of a larger strategy aimed at investigating the nanomechanical mechanisms underlying RNAP function, we are pursuing a two-pronged strategy: a laboratory-based robotic high-throughput approach capable of studying the functional consequences of a large number of site-directed mutants (the "RNAP Factory"; Nottebaum et al., 2008), and a computer-based in-depth investigation of nanomechanical properties of domains (or domain assemblies) using MD methods. The combination of these two complementary methods has already been proven to be highly effective for obtaining further insights into the nanomechanical 'motor' of RNAPs, as represented by the catalytic site and its surrounding protein domains (Fig. 1).
Our most recent studies focused on a prominent component of the catalytic site, the Bridge Helix. This structure is a 35 amino acid long -helix which spans the catalytic site and multiple lines of evidence suggest that the Bridge Helix is, in conjunction with the Trigger Loop, predominantly involved in the translocation of nucleic acid substrates through the catalytic site. High-throughput mutagenesis experiments, combined with systematic molecular dynamics simulations, have demonstrated the presence of two hinges ('BH-H N ' and 'BH-H C ') which bend during the nucleotide addition cycle to propagate the DNA template strand and the nascent transcript through the active site (Klug, 2001;Bar-Nahum et al., 2005;Tan et al., 2008;Brueckner et al., 2009;Seibold et al., 2010;Weinzierl, 2010a and2010b;Heindl et al., 2011). While the C-terminal hinge ('BH-H C ') may be actively involved in pushing the nascent RNA out of the active site to create space for the next incoming ribonucleotide triphosphate (rNTP; Klug, 2001;Bar-Nahum et al., 2005;Tan et al., 2008;Brueckner et al., 2009), the role of the more recently discovered N-terminal hinge (BH-H N ) is still poorly understood. In contrast to BH-H C , the BH-H N region is not in direct contact with any of the substrates, suggesting that BH-H N interacts with the catalytic site more indirectly via allosteric contacts (Weinzierl, 2010b). Of the domains surrounding BH-H N , the 'Link' domain emerges as the most likely candidate for playing such a communication role. As can be seen in Fig. 1, the Link domain consists of a short -helix which contacts directly the N-terminal portion of the Bridge. One particular residue present in the -helical portion of the Link domain (arginine 766 in the RPB2 subunit of S. cerevisiae RNAPII) is evolutionarily invariant and structural studies suggest that it plays a key role in contacting the -phosphate of the rNTP in the active site (Fig. 2).
A simple working hypothesis therefore suggests that the Bridge Helix N-terminal hinge can 'sense' -via the Link domain -whether the catalytic site is loaded with a correctly positioned rNTP before it is joined to the nascent transcript. When the - pyrophosphate group is cleaved off during nucleotide addition and diffuses out of the catalytic site, this particular contact is lost and may result in kinking of BH-H N . This model thus proposes that the Link domain acts as a conformational sensor that communicates the state of the catalytic site (empty, rNTP loaded, nucleotide addition and pyrophosphate release) in order to influence the conformational state of Bridge Helix. In mechanistic terms, completion of the nucleotide addition reaction signals to the Bridge Helix to initiate translocation of the DNA-RNA hybrid to create space for the next rNTP (  Taking into account the possible role of the Link domain in communicating the rNTP loading state of the catalytic site to the Bridge Helix, we decided to investigate the mechanistic properties of this system in more detail. In particular, we are interested in studying the conformational spectrum of the Link domain to gain as much information as possible before initiating an experimental high-throughput mutagenesis approach. As pointed out earlier, it seems highly probable that molecular dynamics simulations are nowadays sufficiently reliable to deduce key nanomechanical properties inherent in a protein domain. We specifically wanted to simulate the Link domain extensively under highly standardized conditions to capture a broad spectrum of conformational changes the domain is capable of. We expected that the data obtained in this way would provide quantitative evidence for the relative stability of different parts of the domain and highlight regions that could potentially be involved in allosteric switching processes.

Implementation
The implementation of the strategy outlined above requires a considerable computational and organisational infrastructure. Here I will discuss the various issues that need to be considered and will provide an overview of the methods we developed to map out an universally applicable approach that can also be used by other researchers to study different systems.

What do we want to achieve?
Before proceeding further, we need to define the desirable outcome of our MD simulation strategy. In order to 'get to know' the dynamic properties of a protein domain, two key requirements must be met: First and most obvious, we need to simulate a domain for a sufficient amount of time so that we can gather quantitative data regarding the overall features of the domains, such as conformational spectrum and overall stability, local variations in secondary structure RMSD values and other information of interest. The simulation time that is considered to be 'sufficient' has increased over the last few years with the steady increase in computational resources available, but is nowadays generally considered to be in the 10-100 nanosecond range. For many stably folded domains, simulation over such a time-frame will allow data collection sufficient to quantitate, for example, the half-life of secondary structure elements. The second key requirement is the need to run multiple independent simulations. In previous work on the Bridge Helix domain (Weinzierl, 2010b;Heindl et al., 2011) we encountered structural features that behaved stochastically, i.e. not every simulation run displayed the full conformational spectrum. We found that kinking of the Bridge Helix hinges observed in some molecular dynamics production runs resulted in very stable alternative conformations that were essentially irreversible within the tens of nanosecond simulation time. Such 'trapped' conformations play almost certainly an important role in molecular switching processes occurring in the millisecond range, but would result in a biased picture of the conformational spectrum of a domain if only a small number of simulations was carried out. For practical purposes (and based on practical experience) we therefore typically perform approximately 50 independent simulations of 20 nanoseconds each to maximize the chance of detecting and quantitating relevant conformational changes occurring within small protein domains of fewer than 30 amino acids (this results in an overall simulation timeof one microsecond).  (Weinzierl, 2010b;Heindl et al., 2011), we switched for our most recent work to high-performance personal computer systems based on workstations. The availability of multi-core CPUs, powerful graphics cards and affordable high-capacity mass storage media has over the last few years enabled the construction of workstation computing systems that are capable of providing the processing power for executing molecular dynamics simulations on a useful time scale. The custom-built workstation is based on liquid cooled dual Xeon 5690 Westmere CPUs clocked at 3.47 GHz with 24 GByte of DDR-3 RAM and running 64-bit Windows 7 Ultimate as operating system (more technical details are available on request). This 12 core (24 hyperthread) platform is capable of delivering 'raw' computing power consistently in excess of 120 GFlops in Linpack output. The MD simulation software of choice, 'YASARA Structure' (www.yasara.org), enables real-time simulations with standard (e.g. AMBER-03; Duan et al., 2003), or various custom force fields (NOVA, YAMBER, YASARA; Krieger et al., 2002;2004;). All structures were energyminimized in pre-equilibrated simulation boxes filled with TP3 water, and sodium and chloride ions added to a final concentration of approximately 150 mM. The simulation runs were executed using a 2 femtosecond time step in a fully solvated atomistic production mode without restraints employing an AMBER03 force field (Duan et al., 2003) at 300 K (27°C) at pH 7.0. The solvent density was kept constant at 0.997 g/l. Longrange Coulomb interactions were calculated using the Particle Mesh Ewald algorithm with periodic boundary conditions (cut-off 7.86Å; Essman et al., 1995).

Constructing a personal high-performance MD simulation environment
In practical terms, the MD workstation allows the atomistic simulation of complex macromolecular assemblies (e.g. complete RNAPs in fully hydrated simulation cells [~300, 000 atoms] to be carried out at a rate of 1 nanosecond per 28 hours computation (average CPU usage ~75%), which compares favourably with resources typically allocated to average users from supercomputing facilities.
The resulting MD trajectories were analyzed with a series of analysis packages to study the spatial and temporal stability of secondary structures, distance relationships, and correlated/concerted conformational movements using YANACONDA macro scripts (http://www.yasara.org/yanaconda.htm). The structures shown in this text were generated using PyMOL (The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC).

High-throughput molecular dynamics simulations of the RNAP link domain
On high-performance computational platforms the simulation of small domains is relatively inefficient because the multi-core CPU processing power is not used to the fullest extent under such circumstances. On the other hand, running multiple simulations at the same time in parallel applications allows more effective use of processing power, but the reading in and out of data for the different simulations wastes time that otherwise would be available for simulations. We have come up with a simple solution to this problem which allows simulations of multiple domains to run in parallel, but within a single application. Construction of a simulation cell that contains multiple copies of the target structure as a regular threedimensional array allows an increased number of small domains to be simulated as a large structure (Fig. 4). The linear dimensions of simulation cells in any particular direction are often limited by the simulation software used, so we typically use three-dimensional clusters that contain the simulated domains in a crystal-like array or cube structure.
The domains within such an array can either be identical copies, thus allowing the systematic sampling of the conformational properties of a particular domain within a single run, or could represent different (e.g. containing various mutations) versions. The presence of multiple copies within the same simulation cell automatically guarantees comparability because the same simulation parameters (e.g. ionic concentration, temperature, pressure and force field) will be applied to all domains in a consistent manner. A particular concern applying to such an arrangement is that the individual domains must be spaced sufficiently far apart to avoid artefactual contacts between the different molecules. Since the domains move randomly during MD simulation (diffusion), we automatically reposition the domains to their original position at fixed intervals (typically after every nanosecond simulation time), thus avoiding any 'clashes' within the simulation cell. Fig. 4. A three-dimensional array of copies of the Link domain arranged for highthroughput MD simulation.
We will illustrate how this procedure works in practice using the Link domain described above. For the experiments described below we ran three independent repeats of arrays containing 18 distinct copies of the Link domain (as illustrated in Fig. 4) from the yeast RNAPII structure (PDB 2E2H; Wang et al., 2006) for 20 nanoseconds simulation time each, thus resulting in data representing just over one microsecond of total simulation data. Snapshots of the simulation cell were stored at ten picosecond intervals and stored for further processing and analysis.

The link domain contains a surprisingly stable -helical core
When subjecting individual domains to molecular dynamics simulations, a common concern is that a domain may not be intrinsically stable, resulting in rapid unfolding and loss of defined structure. We expected this to happen in the case of the isolated Link domain, which consists only of a relatively short -helical region and irregularly structured N-and C-termini (Fig. 5).
www.intechopen.com Surprisingly, this small domain turned out to be exceptionally stable, with an essentially invariant -helical 'core' (amino acid residues spanning serine 764 to alanine 772) remaining in constant -helical conformation throughout most, or even all (asparagine 767 to glutamine 770) of the 20 nanosecond simulation periods (Fig. 6). In contrast, the irregularly structured regions, encompassing proline 757 to glutamine 763 undergoes a variety of conformations, including (random) coil, short-lived  and 3 10 helix and turns (no -sheets were observed at any stage).
From this initial analysis we can therefore conclude that the Link domain consist of a nanomechanically robust portion and a highly flexible part (Fig. 7). A strategically placed proline reside (proline 765) delimits the N-terminal boundary of the -helix (Cochran et al., 2001;Rey et al., 2010;Tendulkar & Wangikar, 2011) and thus determines the position of the sharp kink present in the L-shaped Link domain (Figs. 5 & 7).
It is remarkable that, despite the apparently random movement of the N-terminal portion of the Link domain (proline 757 to glutamine 763), the overall 'L-shape' of the domain is nevertheless maintained in more than 90% of all the simulations carried out so far (Fig. 7). This observation suggests the presence of interacting side-chains that stabilize the intramolecular positioning of the mobile N-terminus relative to the rigid C-terminal -helix.  At the beginning of the simulation (Fig. 8, t=0), the R766 side-chain engages in extensive van der Waals and/or hydrophobic interactions with Q763, which -due to the -helical repeat, occupies a similar orientation. The side-chains appear tightly lined up relative to each other and even move synchronously (Fig. 8, compare t=0 and t=30 ps). R766 can, however, make alternative contacts with Q770, whose side-chain (similar to Q763) occupies the same side of the -helix (Fig. 8, t=120 ps). While H761 is not initially involved in these interactions of R766 with Q763 or Q770, there is a clear tendency at a later stage of the simulation for it to www.intechopen.com approach R766 (Fig. 8, t=350 ps). A conformational change moving Q763 out of the way appears necessary for the kink between the N-terminal flexible part and the -helix to become more distinct. This is followed by a tightening of the contacts between H761 and R766 (Fig. 8, t=590 ps). Strong interactions involving side-chains of histidine and arginine have been documented previously (Heyda et al., 2010) and it therefore appears that this observation is based on similar principles. In the final stages we observe additional stabilization of the H761-R766 interaction through the adjacent glutamine side-chains that were already previously identified as R 766 interaction partners (Fig. 8, t=1310 ps). This association, however, appears to be reversible within the nanosecond time frame of the MD simulation since we observe a distinct break-up of the side-chain cluster towards the end of the run (Fig. 8, t=1910 ps and t= 19970 ps).

Conclusion
The gap between reality (conformational changes occurring in complex molecular machines in the millisecond to second range) and currently existing methodology (atomistic simulation of macromolecular structures in the nano-to microsecond range) is large and prevents us from obtaining a comprehensive understanding of the detailed functions of molecular machines that play fundamental role in key cellular processes. While it is very likely that this gap will narrow with the further development of computing power, viable intermediate solutions have to be found that can bridge the gap in the meantime. The approach outline here is aimed at breaking up complex macromolecular structures into components that provide distinctly identifiable nanomechanical functions which can be extensively studied using MD simulations. By gaining an understanding of the conformational spectrum of the individual components we can obtain two categories of complementary insights: the structural changes that a component is naturally capable of and the conformations that are either only achievable through energetic input or are unlikely to occur at all (this concept is already familiar to many molecular biologists in simpler systems, such as the conformational space allowed to a amino acids as represented by Ramachandran plots).
The work presented here has provided many new insights into the nanomechanical property of a newly identified structure of RNAP, the Link domain. As briefly outlined in the working model presented in Fig. 3, the Link domain appears to be strategically placed to act as an rNTP sensor, capable of transmitting the occupancy state of the catalytic site to the Bridge Helix. The Bridge Helix itself is a conformationally dynamic domain and almost certainly plays a critical role in translocating the nucleic acid substrate through the active site after successful rNTP incorporation (Weinzierl 2010a;2010b). It thus appears essential for the translocation mechanism (Bridge Helix) to be allosterically linked to a sensor capable of reporting the successful completion of the incorporation of a nucleotide building block.
Although the initial identification of the Link domain was made entirely for structural reasons (i.e. its proximity to the N-terminal hinge region of the Bridge Helix), many of the observations emerging from the extensive molecular dynamic characterizations described here strengthen various aspects of the working model. The MD results show very clearly that the C-terminal -helical portion of the Link domain is very stable and conformationally essentially invariant. A proline residue, P765, separates this rigid portion of the domain from a much more flexible part, which also contacts the N-terminal portion of the Bridge Helix (probably with a key involvement of a portion of the F-Loop ( Fig. 1; Miropolskaya et al., 2009;). An arginine residue, R766, appears to play multiple critical roles. Apart from contacting the -phosphate of the rNTP in the catalytic site (Fig. 2), our MD results suggest that R766 may have a decisive influence on the conformation of the Link domain. Strong contacts (Heyda et al., 2010) between H761 and R766 determine the kinking of the L-shaped Link domain (Fig. 8), suggesting that the R766 side-chain makes exclusive contacts with the rNTP or H761, thus providing a plausible model for the transmission of the sensor information to the Bridge Helix (Fig. 3).
The exploration and definition of conformational space accessible to an individual protein domain provides new insights and better-informed working hypotheses for future experimental work. We will shortly initiate a high-throughput mutagenesis screen aimed at replacing each of the residues of the Link domain in the structurally highly orthologous archaeal RNAP (Fig. 5) in order to test the preliminary insights obtained from the computer modelling.

Acknowledgment
The research presented here was funded by MRC project grant G1100057 to ROJW.