Open access peer-reviewed chapter

Introductory Chapter: Homology Modeling

By Rafael Trindade Maia, Magnólia de Araújo Campos and Rômulo Maciel de Moraes Filho

Submitted: November 25th 2020Reviewed: December 10th 2020Published: March 10th 2021

DOI: 10.5772/intechopen.95446

Downloaded: 61

1. Introduction

Proteins are macromolecules present in all living beings and perform a huge variety of complex and diverse functions and structures. They are polymers of amino acids synthesized in the cell of living organisms, also called polypeptides. Determining the three-dimensional structure of a protein is crucial for understanding its function. However, experimental techniques for structural elucidation such as X-ray critalography and nuclear magnetic resonance (NMR) are complicated and expensive [1]. In this context, computational techniques for building structural models are a very useful and viable alternative for different situations. Among computational techniques, homology modeling, also known as comparative modeling, is the most used in silicotool for obtaining structural protein models, achieving excellent results [2].

Proteins are organized at different levels of structural complexity: 1) primary structure; 2) secondary structure; 3) tertiary structure; 4) quarternary structure (Figure 1). The primary structure of a protein comprises the linear sequence of the amino acids that compose it, with one end containing the carboxyl group of the first amino acid in the chain (C-terminal) and with one end containing the amino group of the last amino acid in the chain (N -terminal). The primary structure of a protein can be represented by a pattern of letters that represents its peptide constitution (amino acids). The secondary structure of a protein is determined by the primary sequence, which is decisive in the arrangement of the monomers (aminoacids) with each other and with the solvent, forming standard structures in three groups: the turns, the helix and the β-leaves. The way in which these secondary structures are organized three-dimensionally in space is what is called a tertiary structure, which is associated with the biological function of the molecule in question. In multimeric protein complexes (dimers, trimers, tetramers, etc.) there is also the formation of the quarternary structure, which is the oligomeric state formed by the aggregation of these macromolecular compounds of tertiary structure.

Figure 1.

Illustrative scheme for the structural complexity levels of proteins. Source: Google images.

There are three types of computational modeling for predicting protein structures: by ab initio/De novo, by Threadingand by homology modeling. Homology modeling is based on the premise that the three-dimensional structure of a protein tends to be much more conserved than its primary structure. Therefore, changes in the sequence do not always change the structural domains of a protein, thus maintaining its original function. It is assumed that proteins from the same functional family maintain their structural domains, which allows the so-called comparative modeling (by homology). If two proteins are homologous, it means that they belong to the same genetic and functional family, and hypothetically, they have the same structural motifs. In the case of a specific protein that does not have an elucidated three-dimensional structure, but it is homologous to a protein with a solved structure, a three-dimensional model for the sequence can be built using the known structure as a template. As a rule, a minimum identity of 25% between the amino acids of two proteins is sufficient for the construction of models by homology. Sequence identities above generally 40% provides good models, while those above 50% tend to provide excellent theoretical structures [3].

However, in addition to the identity and similarity between the amino acids, other parameters must be observed when choosing a good template, such as the resolution in angstroms of the crystallographic structure and the percentage of alignment coverage (Figure 2). The lower the resolution of a structure, the better its quality. The average resolution of the structures available in the PDB (Protein Data Bank) is around 3.5 Ä, while structures below 2.0 Ä are considered to have excellent resolution and represent less than 10% of the entries in the PDB. The higher the percentage of coverage of the alignment between a target protein (protein to be modeled) and the template (mold), the better [4]. Coverage alignments above 90% of the residues tends to have high scores and are considered to be excellent (Figure 2).

Figure 2.

Example of BLASTp alignment between aLeishmania infantumATP-synthase sequence against the PDB database. Values ​​of the coverage percentage (red) and identity (black) of each alignment are highlighted. Source: Authors data.

Something important to note in alignments is the presence of sequence gaps. A gap between sequences means the absence of residues, that is, amino acids that have been deleted from some part of the sequence (Figure 3). The amount and size of gaps in an alignment is crucial to the final quality of the models. The greater the quantity and size of the gaps, the less reliable the models are and the greater is the chance of generating structural artifacts. Therefore, when choosing a template, it is essential that the researcher be aware about gaps presence in the sequences.

Figure 3.

Alignment between two proteins (query/Sbjct) showing the presence of 8 gaps (red) in three different sections (green). Source: Authors data.

Once the template has been defined, we proceed to the stage of building the three-dimensional model. From specific programs and servers, the necessary files for modeling are submitted, which consists of the superimposition of the structural carbons of the target protein on the template protein, based on the alignment information to superimpose the equivalent amino acids. There are currently numerous free tools for building three-dimensional models (Table 1).

Table 1.

Examples of free tools for building homology models.

Source: Google search.


2. Validation and refinement

Homology models are theoretical-computational approximations of the real protein structures, and therefore require validation and sometimes refinement and optimization. A very popular validation tool is the Ramachandran plot (Figure 4), which analyzes the stereochemical quality of protein structures.

Figure 4.

Ramachandran graph for SARS-CoV-2 NSP9 replicase (PDB ID: 6w4b). In red, more favorable regions. In yellow and beige, regions allowed. In white, forbidden regions. Source: Authors data.

The Ramachandran graph analyzes the conformations of the phiand psiangles of the peptide bonds, placing them in regions. Residues outside the permitted regions (outliers) are those that are in unfavorable configurations due to the collision between the atoms (steric shock). It preconizes that a good model should have at least 90% of its waste in favorable and permitted regions [5].

Other validation tools are energy assessments, both local and global ones. A tool for global assessment of the quality of a model is the server PROSA-web - Protein Structure Analysis ( [6, 7], which compares the energy of a structure with a database of proteins of equivalent size, solved experimentally, through the Z-score (Figure 5).

Figure 5.

Comparative graph of the Z-score energy. The black dot represents the position of the analyzed protein compared to equivalent size structures obtained by x-ray crystallography (light blue) and nuclear magnetic resonance (dark blue). Source: Authors data.

For local quality analysis, the application of the VERIFY3D server ( is very useful. In this type of analysis it is possible to check the local quality, that is, for each residue of the model (Figure 6). With this, it is possible to identify specific regions of low quality for further adjustments.

Figure 6.

Local ERRAT quality graph of a stretch from the NS5 enzyme from Zika virus. In blue, the average scores, in green, the raw scores. 93.93% of the residues have averaged 3D-1D score > = 0.2 (80% indicates good structures). Source: Authors data.

For the models refinement, two techniques are particularly interesting: energy minimization and classical (atomistic) molecular dynamics. Energy minimization, also called optimization of geometry, aims to find a set of atomic coordinates of the structure that avoid bad contacts and reduce the potential energy of the system. There are some free servers available for energy minimization application in theoretical models, like YASARA [8] ( and CHIRON [9] ( Molecular dynamics are extremely efficient for validating and refining theoretical models. This technique is based on the principles of Classical Mechanics and describes the atomic movements of a system through the integration of Newtonian equations of motion. Thus, a molecular dynamics simulation of 5–10 nanoseconds is one of the most effective techniques for optimization and validation of models by homology. For performing molecular dynamics calculations, software such as GROMACS [10] and NAMD [11] are useful. Once optimized and validated, the theoretical model can be used for several purposes, and can also be made available in public repositories, such as the PMDB - Protein Model DataBase ( and the SWISS-MODEL repository (

3. Conclusions

Theoretical-computational models are fast, inexpensive and extremely versatile. There are countless possibilities for studies and uses of models by homology. These structures can be used for drug screening, docking studies, development of new drugs and vaccines, elucidation of binding sites (catalytic and allosteric), molecular dynamics simulations, quantum studies, biomolecule engineering etc.

The future of molecular modeling is fascinating and promising. With the advancement of computational tools, theoretical models tend to be increasingly accurate and reliable, contributing more and more to biological and biotechnological researches, in addition to integrating various areas of knowledge with bioinformatics and computational biology.


The authors are grateful to the Federal University of Campina Grande and to Federal Rural University of Pernambuco.

Conflict of interest

The authors declare no conflict of interest.

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Rafael Trindade Maia, Magnólia de Araújo Campos and Rômulo Maciel de Moraes Filho (March 10th 2021). Introductory Chapter: Homology Modeling, Homology Molecular Modeling - Perspectives and Applications, Rafael Trindade Maia, Rômulo Maciel de Moraes Filho and Magnólia Campos, IntechOpen, DOI: 10.5772/intechopen.95446. Available from:

chapter statistics

61total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Normal Mode Analysis: A Tool for Better Understanding Protein Flexibility and Dynamics with Application to Homology Models

By Jacob A. Bauer and Vladena Bauerová-Hlinková

Related Book

First chapter

Introductory Chapter: Population Genetics - The Evolution Process as a Genetic Function

By Rafael Trindade Maia and Magnólia de Araújo Campos

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us