Examples of free tools for building homology models.
Proteins are macromolecules present in all living beings and perform a huge variety of complex and diverse functions and structures. They are polymers of amino acids synthesized in the cell of living organisms, also called polypeptides. Determining the three-dimensional structure of a protein is crucial for understanding its function. However, experimental techniques for structural elucidation such as X-ray critalography and nuclear magnetic resonance (NMR) are complicated and expensive . In this context, computational techniques for building structural models are a very useful and viable alternative for different situations. Among computational techniques, homology modeling, also known as comparative modeling, is the most used
Proteins are organized at different levels of structural complexity: 1) primary structure; 2) secondary structure; 3) tertiary structure; 4) quarternary structure (Figure 1). The primary structure of a protein comprises the linear sequence of the amino acids that compose it, with one end containing the carboxyl group of the first amino acid in the chain (C-terminal) and with one end containing the amino group of the last amino acid in the chain (N -terminal). The primary structure of a protein can be represented by a pattern of letters that represents its peptide constitution (amino acids). The secondary structure of a protein is determined by the primary sequence, which is decisive in the arrangement of the monomers (aminoacids) with each other and with the solvent, forming standard structures in three groups: the turns, the helix and the β-leaves. The way in which these secondary structures are organized three-dimensionally in space is what is called a tertiary structure, which is associated with the biological function of the molecule in question. In multimeric protein complexes (dimers, trimers, tetramers, etc.) there is also the formation of the quarternary structure, which is the oligomeric state formed by the aggregation of these macromolecular compounds of tertiary structure.
There are three types of computational modeling for predicting protein structures: by
However, in addition to the identity and similarity between the amino acids, other parameters must be observed when choosing a good template, such as the resolution in angstroms of the crystallographic structure and the percentage of alignment coverage (Figure 2). The lower the resolution of a structure, the better its quality. The average resolution of the structures available in the PDB (Protein Data Bank) is around 3.5 Ä, while structures below 2.0 Ä are considered to have excellent resolution and represent less than 10% of the entries in the PDB. The higher the percentage of coverage of the alignment between a target protein (protein to be modeled) and the template (mold), the better . Coverage alignments above 90% of the residues tends to have high scores and are considered to be excellent (Figure 2).
Something important to note in alignments is the presence of sequence gaps. A gap between sequences means the absence of residues, that is, amino acids that have been deleted from some part of the sequence (Figure 3). The amount and size of gaps in an alignment is crucial to the final quality of the models. The greater the quantity and size of the gaps, the less reliable the models are and the greater is the chance of generating structural artifacts. Therefore, when choosing a template, it is essential that the researcher be aware about gaps presence in the sequences.
Once the template has been defined, we proceed to the stage of building the three-dimensional model. From specific programs and servers, the necessary files for modeling are submitted, which consists of the superimposition of the structural carbons of the target protein on the template protein, based on the alignment information to superimpose the equivalent amino acids. There are currently numerous free tools for building three-dimensional models (Table 1).
2. Validation and refinement
Homology models are theoretical-computational approximations of the real protein structures, and therefore require validation and sometimes refinement and optimization. A very popular validation tool is the Ramachandran plot (Figure 4), which analyzes the stereochemical quality of protein structures.
The Ramachandran graph analyzes the conformations of the
Other validation tools are energy assessments, both local and global ones. A tool for global assessment of the quality of a model is the server PROSA-web - Protein Structure Analysis (https://prosa.services.came.sbg.ac.at/prosa.php) [6, 7], which compares the energy of a structure with a database of proteins of equivalent size, solved experimentally, through the Z-score (Figure 5).
For local quality analysis, the application of the VERIFY3D server (https://servicesn.mbi.ucla.edu/Verify3D) is very useful. In this type of analysis it is possible to check the local quality, that is, for each residue of the model (Figure 6). With this, it is possible to identify specific regions of low quality for further adjustments.
For the models refinement, two techniques are particularly interesting: energy minimization and classical (atomistic) molecular dynamics. Energy minimization, also called optimization of geometry, aims to find a set of atomic coordinates of the structure that avoid bad contacts and reduce the potential energy of the system. There are some free servers available for energy minimization application in theoretical models, like YASARA  (http://www.yasara.org/minimizationserver.htm) and CHIRON  (https://dokhlab.med.psu.edu/chiron/). Molecular dynamics are extremely efficient for validating and refining theoretical models. This technique is based on the principles of Classical Mechanics and describes the atomic movements of a system through the integration of Newtonian equations of motion. Thus, a molecular dynamics simulation of 5–10 nanoseconds is one of the most effective techniques for optimization and validation of models by homology. For performing molecular dynamics calculations, software such as GROMACS  and NAMD  are useful. Once optimized and validated, the theoretical model can be used for several purposes, and can also be made available in public repositories, such as the PMDB - Protein Model DataBase (http://srv00.recas.ba.infn.it/PMDB/) and the SWISS-MODEL repository (https://swissmodel.expasy.org/repository).
Theoretical-computational models are fast, inexpensive and extremely versatile. There are countless possibilities for studies and uses of models by homology. These structures can be used for drug screening, docking studies, development of new drugs and vaccines, elucidation of binding sites (catalytic and allosteric), molecular dynamics simulations, quantum studies, biomolecule engineering etc.
The future of molecular modeling is fascinating and promising. With the advancement of computational tools, theoretical models tend to be increasingly accurate and reliable, contributing more and more to biological and biotechnological researches, in addition to integrating various areas of knowledge with bioinformatics and computational biology.
The authors are grateful to the Federal University of Campina Grande and to Federal Rural University of Pernambuco.
Conflict of interest
The authors declare no conflict of interest.