Protein-protein docking is a molecular modeling strategy to predict biomolecular complexes and assemblies. Traditional protein-protein docking is performed at atomic resolution, which relies on X-ray and NMR experiments to provide structural information. When dealing with biomolecular assemblies of millions of atoms, atomic description of molecular objects becomes very computational inefficient. This article describes a development work that introduces map objects to molecular modeling studies to efficiently derive complex structures through map-map conformational search. This method has been implemented into CHARMM as the EMAP command and into AMBER in its SANDER program. This development enables molecular modeling and simulation to manipulate map objects, including map input, output, comparison, docking, etc. Through map objects, users can efficiently construct complex structures through protein-protein docking as well as from electron microscopy maps according to low map energies. Using a T-cell receptor (TCR) variable domain and acetylcholine binding protein (AChBP) as example systems, we showed the application to model an energetic optimized complex structure according to a complex map. The map objects serve as a bridge between high-resolution atomic structures and low-resolution image data.
- protein-protein docking
- molecular modeling
- electron microscopy
- molecular image
- computational tool
- protein complexes
- biomolecular assembly
Protein-protein docking has been a powerful approach to provide structural insights into biological procedures at atomic level [1, 2, 3, 4]. Based on the structural information provided by X-ray and NMR, as well as constraints derived from biological data such as mutagenesis observations, protein-protein docking can produce structures and interactions of protein complexes, which helps to illustrate structural mechanism of many biological processes [5, 6, 7].
New development in experimental technologies, such as electron microscopy, provides an approach to obtain low-resolution structure information of large molecules and their assemblies . Extracting structure information from these low-resolution maps and obtaining atomic interpretation of the large biomolecular assemblies become a central piece of modern structural biology . This requires molecular modeling to be conducted on these low-resolution maps, as well as high-resolution atomic structures, to maximize the capability in structural biology studies.
On the other hand, as the development of structural biology, molecular modeling is applied to larger and larger biomolecular machineries. As the biology systems become larger, atomic description of molecular system becomes very inefficient and time-consuming. Millions of atoms and their chemical structural become redundant in many of modeling studies. Therefore, it would be very efficient if large biomolecules are simplified to shape objects while ignoring their internal structures. Although molecular flexibility plays important roles in biological activities, in many cases, molecular geometric shapes plus surface properties are sufficient to describe many cellular processes such as molecule assembling and protein-protein binding. In these cases, it is satisfactory to describe large molecules as rigid domains. In some cases certain internal flexibility can be simplified to the motion of several rigid fragments. Therefore, molecular modeling of large biomolecular machinery can be achieved efficiently by simplifying biomolecules with simplified shape objects.
In this work, we introduce map objects to represent molecules with fixed structures to achieve efficient molecular modeling of large molecular systems and to efficiently derive structural information from low-resolution experimental maps. Map objects are designed to work with high-resolution atomic structures so that low-resolution maps are interchangeable with high-resolution atomic structures. A map object represents a property distribution over certain space, while a molecular structure is generally described by the coordinates of a set of atoms. This work describes an efficient approach to handle and manipulate map objects so that efficient molecular modeling of large systems can be performed.
2. Method and design
We introduce map objects to represent space occupation of molecular structures. Unlike chemical description of molecules that contain atoms that are linked by chemical bonds, a map does not have internal chemical structures. Instead, a map represents a spatial distribution of certain properties, typically electron density. This distribution generally is described as scalar values at discrete grid points due to irregularity of the distributions and limit in storage. Figure 1 shows a cartoon of a map object. As can be seen, a map objects contains three components.
2.1 Grid definition
The grid of a map object is defined by its starting position, x0, y0, and z0; grid intervals, dx, dy, and dz; and grid point numbers, nx, ny, and nz.
2.2 Molecular reference
Because we use map to represent a molecular structure, we use molecular reference to record which molecule this map is representing. Through reference molecule, map object and molecule coordinates become interchangeable.
2.3 Distribution properties
The distribution property describes the distribution of given property over the space covered by the grid points. This can be the electron density measured in experiment or properties generated from reference molecules.
Here are several typical types of map objects used in molecular modeling:
1. Electron density maps
This is the most widely used map type, which describes the electron density over the space. This type of map is often determined from electron microscopy. It can also be derived from molecular structure based on atomic coordinates and type.
2. Electric charge maps
This type of map is solely derived from molecular structures based on a force field. The partial charges of atoms are distributed to their nearest grid points.
3. Electric field maps
Because electrostatic interactions are long ranged, it is difficult to have a map to cover a very large space. Instead, we propose to use transformed coordinates:
where x is the real space coordinate, X is the reduced coordinate, and b is a constant controlling the reduction. X will take a range of (−1, 1) to represent a real space of x over .
4. VDW core maps
The VDW cores provide boundary to avoid overlapping between molecules. The core map is constructed based on the accessibility of a molecular structure. The surface has low core index, while the center has high index (the core indices are shown as the number in each grid box in Figure 1).
2.4 Rigid domains
Because a map object contains a large amount of data, it is inconvenient to perform movement on a map itself. For example, a rotation of a map object will result in the rectangular space not parallel to the coordinate axis anymore, and new boundaries and distributions need be updated accordingly. In addition, a real system often contains more than one copy of some molecular species, and it would be very memory costing to have a map object for each copy of these species. Instead, we define a rigid domain to represent a copy of the molecular species. A rigid domain contains only the identity of the map object it represents and the position and orientation vectors related to the map object, and can be manipulated easily. A rigid domain can be understood as a mobile representation of a map object. Each rigid domain has a unique identity, and many rigid domains can represent the same map object. Figure 2 shows the map objects of the α-chain and β-chain of a TCR variable domain and their manipulation through rigid domains.
Each rigid domain is defined by its map ID and its translation vector, T, and rotational matrix, U:
The operation, translation, and rotation are done by applying these vectors:
and many operations can be accumulated without losing accuracy:
When a map object is created, a rigid domain at origin is created for it, with
A rigid domain can also be created for a molecular structure by linking the structure to a given map object according to the translation vector and rotation matrix from the reference coordinates to the linked structure.
This equation also provides a way to update the structure coordinates according to the position and orientation of a rigid domain.
2.5 Map comparison
Map comparison provides a target function for fitting one map into another map. Four types of cross-correlation functions  are provided for comparison between map objects, which are listed below.
1. Density correlation (DC)
represent the average and fluctuation of the density distribution. is the density correlation of map m to map n. Figure 3 shows two comparison maps in two dimensions. is calculated according to map m’s dimension and grid properties. The calculation runs over all grid points of map m, which are transformed and interpolated into grid points of map n to get corresponding density properties.
2. Laplacian correlation (LC)
where is the Laplacian filtered density derived from density distribution by the following finite difference approximation:
is the Laplacian correlation of map m to map n. Similar to , is calculated according to map m’s dimension and grid properties.
3. Core-weighted density correlation (CWDC)
where represents a core-weighted average of distribution property :
where wmm is core-weighting function of core m to core n. Three parameters, a, b, and kc, control the dependence of the function to the core indices. We chose a = 2 and kc = 1 in this work calculations, and b is set to a very small value, say 10–6, to ensure when and . Therefore, only the core region of map m has contribution to the core-weighted density correlation, .
4. Core-weighted Laplacian correlation (CWLC)
uses Laplacian filtered density, instead of the density in the calculation. Again, only the core region of map m has contribution to the core-weighted Laplacian correlation, .
2.6 Molecular interactions between map objects
Energetics of molecular systems is the basis of molecular modeling. Calculation of molecular interaction using map objects is the crucial step for a successful modeling or simulation study. For atomic objects interaction calculation is pairwise and is very time-consuming for large molecular assemblies. For map objects, we propose to use field interactions that can be calculated much more efficiently. We define four types of interactions to describe interaction between map objects: electric field interaction, surface charge-charge interaction, VDW interaction, and desolvation interaction as described below.
1. Electric field interaction
The electric field around a molecule is described by the field map with scaled coordinates. The interaction with the field is
where is the charge at the charge map 1 and is the electric field from object 2, which depends on the dielectric constant, ε, and distances from each grid points of object 2. The dielectric constant, ε = 80, is used for most cases.
2. VDW interaction
Surface interaction brings the surface together while avoiding core overlapping. The surface can be identified by low core index. We propose to use the following equation to make the surface contact favorable while overlapping unfavorable:
where C1 and C2 are the core indices of molecular 1 and 2 at each grid point and δ1 and δ2 are the grid intervals of map. 1 and 2, respectively. υ is the VDW interaction parameter.
3. Surface charge: charge interaction
Upon binding, the surface charge groups will contact with each other. The surface charge–charge interaction is different from the charge-field interaction which is screened by the solvent environment:
where b is the surface interaction parameter.
4. Desolvation interaction
Before and after binding, the surface charge groups change from the solvation state to the buried state and will create an energy gain we termed as desolvation energy:
where s is the desolvation parameter.
These interaction parameters used to define the interactions, Eqs. (20)–(23), can be derived from atomic force field or from experimental data. By fitting into energies calculated with the CHARMM force field , we obtained the parameters υ = 0.14 kcalÅ, b = 330 kcal/(C2Å), and s = 70 kcal/(C2Å2).
2.7 Conformational search
We implemented the grid-threading Monte Carlo searching algorithm  for robustly fitting rigid domains to a target map. The grid-threading Monte Carlo (GTMC) search is a combination of the grid search and Monte Carlo sampling. As shown in Figure 3, the conformational space is split into grid points, and short Monte Carlo searches are performed to identify local maximums around the grid points. The global maximum is identified among the local minimums.
3. Results and discussions
3.1 Complex structures from EM maps
Deriving high-resolution molecular assembly structures from microscopy maps are a major application of the map approach. This method has been successfully applied into several experimental studies [12, 13]. Figure 4 illustrates the steps to perform a fitting of high-resolution molecular structure into electron microscopy maps. We chose a T-cell receptor (TCR) variable domain (PDB code: 1a7n) as an example complex to illustrate the modeling process with map objects. The TCR variable domain is a complex of two chains, α-chain and β-chain. The two chains are first blurred into maps of the same resolution (here 15 Å) as the EM map. Then each map is fitted into the EM map to get a complex map. The complex map is projected back to atomic structures, which is the complex structure we are looking for. The root mean square (rms) deviation of the fitting result from X-ray complex is 3 Å.
The structure obtained from map fitting generally is not optimized in atomic details. There are often atom overlaps or improper spacing between components. This structural mismatch can be removed by many modeling methods available in CHARMM [14, 15], such as energy minimization and simulated annealing, if the fitting result is very close to the right structure. After the minimization, the rms deviation is 0.97 Å.
3.2 Complex structures from energy optimization
The energy function is designed to have the minimum at the binding conformation. Therefore, it is possible to determine complex structures through minimizing the map interaction energy in cases where the EM complex map is not available. It should be noted that the map object assumes certain rigidity of a molecular object. Certain flexibility of loop region can be accommodated by the low-resolution characters, while large flexibilities like domain movement should be dealt with multiple map objects. Recently, this method was successfully applied in modeling of the peroxiredoxin (Prx) complex .
Figure 5 shows the steps to perform an energy-based conformational search to determine complex structures. In this case, no EM map is used. The TCR chains are transferred into property maps that allow interaction between map objects to be calculated. By searching the minimum interaction energy conformation, the complex structure is determined. The final result is only 0.57 Å away from the X-ray structure. It should be noted that this is an ideal case that the structure of the two chains is taken from the complex and there is no conformational change in this fitting process.
It is interesting to see that energy-based approach to derive complex structure takes account of molecular structure and energetic information of molecules. Figure 6a–c shows the electric field, shape, and charge maps of the two chains and their complex. Obviously, the two chains are binding together to have the low potential region matching the high potential one, to have shape complementary to each other, and to have surface charge overlapped oppositely.
Figure 6a shows the electric field of TCR α-chain and β-chain and their complex. Please notice the reduced coordinates are used for the map. The range of (−1, 1) for the reduced coordinates covers the range of (−∞, ∞) for the regular coordinates. The α-chain has negative field near its top-left and bottom-right areas and positive field near its lower-right and upper-left areas. Correspondingly, the β-chain has positive field at its top-right area and negative field at its bottom area, which are complementary to the α-chain. As a result, the complex map has negative field at its top and bottom areas and positive field at its left and right areas. The symmetric distribution of the field of the complex indicates its stability.
Figure 6b shows the core indices of TCR α-chain and β-chain and their complex. The high values in the core indices indicate the region further away from surface and are difficult to access. The α-chain and β-chain that show complementary shape are the binding surface. Their complexes are the two maps matching together.
Figure 6c shows the electric charge distribution of TCR α-chain and β-chain and their complex. The α-chain map shows more negative charges at the right side, while the β-chain shows more positive charges at its left side. The complex map shows the two chains come together with negative patches contacting positive patches. Overall, these map interactions provide energetic basis for protein-protein docking as shown in Figure 5.
As a further example of protein-protein docking, we show the procedure to build the pentamer of acetylcholine binding protein (AChBP). The monomers are docked one by one to form dimer, trimer, tetramer, and pentamer (Figure 7). The resulting pentamer is only 0.73 Å away from the X-ray structure, 1I9B. The electric field maps during the building process are shown in Figure 8. From the field map of the monomer, we can see the most positive field is at the top-right area and the most negative field is at the bottom-left area. A dimer is formed by matching the positive area of the second monomer with the negative area of the first one. The third monomer’s positive area fits into the most negative area of the dimer to form the trimer. Similarly, the fourth and fifth monomers are docked to form the tetramer and pentamer. The map interaction limits the way of docking monomers and allows correct assemblies to be built.
Map objects cannot only be used to model rigid proteins , they can also be used for targeted conformational search such as flexible fitting and restrained molecular dynamics [18, 19]. Map objects provide an efficient bridge from molecular systems to large-scale bodies such as cells and organelles.
This work designed and developed a computational tool to manipulate map information for molecular modeling studies. Protein–protein docking can be efficiently performed with map objects. This tool is implemented into CHARMM, as a module, EMAP, and into AMBER in its SANDER program. Our design and implementation make it very flexible and efficient to perform various manipulations of map objects and to perform some routine task related to map data. This module enables user to construct macromolecular assemblies by docking high-resolution X-ray or NMR structures to low-resolution cryo-electron microscopy maps. And when there is no EM map available, this module allows user to search for low-energy complex structures, for example, in protein-protein docking. By replacing high-resolution atomic structure with low-resolution map objects, this work creates a convenient approach to extend the molecular modeling studies to large biomolecular machinery. This map-based approach can extend modeling and simulation objects from molecular systems to macroscopic systems like cells and bacteria.
This research was supported by the Intramural Research Programs of National Heart, Lung, and Blood Institute (Z01 HL001027-34).
Conflict of interest
The authors declare there is no conflict of interest in publishing this work.