Homology modeling is one of the key discoveries that led to a rapid paradigm shift in the field of computational biology. Homology modeling obtains the three dimensional structure of a target protein based on the similarity between template and target sequences and this technique proves to be efficient when it comes to studying membrane proteins that are hard to crystallize like GPCR as it provides a higher degree of understanding of receptor-ligand interaction. We get profound insights on structurally unsolved, yet clinically important drug targeting proteins through single or multiple template modeling. The advantages of homology modeling studies are often used to overcome various problems in crystallizing GPCR proteins that are involved in major disease-related pathways, thus paving way to more structural insights via in silico models when there is a lack of experimentally solved structures. Owing to their pharmaceutical significance, structural analysis of various GPCR proteins using techniques like homology modeling is of utmost importance.
- membrane protein
- bovine rhodopsin
- template-based modeling
The comparative modeling of proteins, more popularly known as homology modeling among the research community, is a computational procedure that constructs three dimensional atomic resolution structure of a ‘target’ protein, the structure of it is unknown. A new structure for the target protein is modeled using its own amino acid sequence and a known experimental structure of a homologous protein as a template based upon which the model is constructed. This template-based modeling technique became a plausible computational technique because of the fact that evolutionary related proteins share a similar structure . This undeniable truth led to the famous outbreak of using homology modeling to determine the three dimensional structure of proteins whose structures were otherwise difficult to solve.
One such family of protein that poses a great challenge to study is membrane proteins due to their partially flexibility and lack of stability. The surface of membrane proteins is also comparatively hydrophobic and can only be extracted from the cell membrane with detergents which cause challenges at many levels, including expression, solubilization, purification, crystallization, data collection and structure solution. Figure 1 shows the total number of membrane protein structures deposited in PDB as of August 2020 and this data was derived from the mpstruc database . There are 2037 published reports of membrane protein structures in this database. It is also very clear that the number of available membrane protein structures is very less compared to the expected exponential growth in the number of available structures . Though approximately 25% of all proteins are membrane proteins, there are less solved structures available due to the difficulty in crystallizing membrane proteins.
Many advances are being made in developing novel methods that can help in solving and studying the structure of membrane proteins in a high-throughput manner. The key to overcome the membrane protein structural biology is the underlying fact that they are structurally homologous to proteins which are evolutionarily related to them. This kindled the structural biologists to try a large number of targets and homologs of each target so that at least a few proteins will show progress through all the steps associated with their structural studies. This is where computational techniques like homology modeling came to the aid of structural biologists in helping solve the structures of membrane proteins by obtaining the three dimensional structure of a target protein based on the similarity between template and target sequences. One arena of such membrane protein structural biology research that has proved to be promising is the G-protein-coupled receptors (GPCRs) which are the largest family of membrane proteins.
The GPCRs constitute a diverse family of proteins in mammalian genomes . The first GPCR for which structure was determined was Rhodopsin, a prototypical class A GPCR. The GPCRs are categorized into five major classes based on their sequences as well as on their known or suspected functions in vertebrate: rhodopsin (family A), secretin (family B), glutamate (family C), adhesion and Frizzled/Taste2 . The actual estimate of GPCRs in human genome is still being analyzed. The presence of seven transmembrane (7-TM) spanning α-helical segments separated by alternating intracellular and extracellular loop regions is one of the characteristic features in the structure of GPCRs. They also possess an extracellular N-terminus and an intracellular C-terminus which paved way for GPCRs to be also known as the 7-TM receptors or the heptahelical receptors. The tertiary structure of the GPCR resembles a barrel, with the seven transmembrane helices forming a cavity within the plasma membrane that serves as a ligand-binding domain. With its unique structure, the GPCRs serve many important roles in the human body. Hence the structure function correlation of GPCRs is a vital area of research even today.
The crystal structure of protein plays a pivot role in determining the functional importance of a protein. However membrane proteins are difficult to crystallize. Being a membrane protein, the GPCR structural studies have complexity because of low protein expression level in native tissues and heterologous systems. The poor protein stability and multiple conformational states of the receptors also are major hurdles in the GPCR structural studies. GPCRs have also been notoriously difficult to crystallize owing to their intrinsic flexibility and the above mentioned reasons . For such special cases, homology modeling aids in developing three dimensional models of such proteins. This has been possible through the understanding about the structure of GPCRs facilitated by homology modeling. Since many of these receptors lack experimentally solved structures, in silico methods like homology modeling were applied to gain insights. Template structure with high homology was used for modeling the structures to gain more advance insights on their function. Approximately one-fifth of the total GPCRs structure are solved whereas the remaining GPCR structures can be predicted by homology modeling. Three dimensional model building with the help of template helps us to predict protein structural and functional domains which further aids in drug discovery.
.This chapter deals with the contribution of homology modeling to the structural studies of GPCRs.
2. The importance and multifaceted functionality of GPCRs
The importance of G-protein coupled receptors (GPCRs) in the fields of biology, medicine and pharmaceutical studies have been extensively studied, well established and properly documented . Due to its significance in playing a crucial role in various normal and pathological processes, GPCRs have become a major field of advanced research and a promising focus for drug discovery processes. The GPCRs have an extensive medical significance owing to their position and function within the human cell spanning the whole cell’s plasma membrane. By this way it bridges the extra- and an intracellular environment which enables the GPCRs to act as signal transducers wherein it acclaims a direct mechanism for the transduction of extracellular messages into intracellular responses. In this way and together with their transmitters and effectors, GPCR systems function to modulate a broad spectrum of cellular phenomena dictated by the needs of the tissues and organs they serve. The gradient of GPCR distribution across vast majority of the body’s organs and tissues and its primary role as signal transducers like converting transduce extracellular stimuli into intracellular signals at cellular levels makes it fascinating molecules from the perspective of advanced structural research.
Other fascinating roles of GPCRs include modulation of neuronal firing, regulation of ion transport across the plasma membrane and within intracellular organelles, modulation of homeostasis, control of cell division/proliferation, and modification of cell morphology. GPCRs are also an important target for cardiac drug therapy as decades of research revealed that GPCRs are the epicenters of many of the multiple causative factors of cardiovascular diseases like diabetes, obesity, environmental stressors and genetic factors . Thus understanding the GPCR signaling mechanism in a healthy and an ailing heart may give better insights into treating cardiovascular problems.
There are over 200 cardio GPCRs and understanding their structural and functional properties is a key element in understanding the occurrence of heart diseases . G-proteins consist of α, β, and γ subunits and a lot of global research has been carried out to check the various GPCR signaling pathways in a healthy and an ailing heart. Clinically targeted cardiac GPCRs like adrenergic receptors are responsible for translating chemical messages from the sympathetic nervous system into cardiovascular responses. Other such potentially targeted clinical GPCRs include angiotensin, endothelin, and adenosine receptors. Thus to study deeper about such cardio GPCRs one has to have structural studies carried out prior to analyzing its functionality.
Chemokine receptors belonging to the class A of GPCRs are involved in variety of physiologic functions, mostly related to the homeostasis of the immune system. They are also involved in multiple pathologic processes, including immune and autoimmune diseases, as well as cancer.
Other ailments caused when fundamental pathways governed by GPCRs go awry are asthma and strokes and cerebral hypoperfusion . GPCRs control airway smooth muscle (ASM) contraction and increased airway resistance when coupled to Gq receptors. Airway epithelium and hematopoietic cells that are involved in control of lung inflammation that causes most asthma, have various pathways that are mediated by GPCRs. Arrestins regulate GPCR signaling and once again structural insights into the GPCRs is essential in understanding vital role of arrestins in those GPCR-mediated airway cell functions that are dysregulated in asthma.
3. A brief history on the structural study of GPCRs
GPCRs have been considered as one of the most desirable drug targets for the past few decades and have been investigated extensively. But the three dimensional structures of GPCRs have only recently become available. The first step in the structural study of GPCRs happened in the year 2000 with the initial crystal structure determination of Bovine rhodopsin (PDB: 1F88) through X-ray diffraction method . The GPCR rhodopsin was purified from bovine rod outer segment (ROS) membranes. Multiwavelength anomalous diffraction (MAD) methods were employed to get the phasing information and the diffraction data from the crystallized Bovine rhodopsin were collected to 2.8 Å after mercury soaking. This experimental model of rhodopsin became a structural template for other GPCRs owing to the molecular size of Bovine rhodopsin, 348 amino acids, which was intermediate among the members of the GPCR family and thus can feature most of the essential parts of functional importance in G-protein activation.
An year later in 2001 the solution NMR method was used to solve the structure of Bovine rhodopsin (PDB ID: 1JFP ). It then took 7 long years to crystallize the next GPCR ADRB2 (PDB ID: 2RH1, 2R4R/2R4S [13, 14]). It was solved using the LCP method that provides a more native, lipid environment for crystallization to a resolution of 2.4 Å. This delay was due to the need of numerous technological advancements required to crystallize membrane proteins like GPCRs. Developments in protein engineering, computational methods like homology modeling and heterologous protein expressions have accelerated structural determination of GPCRs . In this structure solved via the LCP technique, the proteins are placed in a membrane-like environment where they can diffuse and interact with each other to form crystal lattice contacts on both complementary hydrophobic and hydrophilic regions. These structures served as the template for the other crystal structures that were solved afterwards. Other receptors like H1R, D3R and 5-HT1B belonging to the Rhodopsin family of GPCRs were solved in the following years and served as templates for all the other GPCR structures that were predicted by homology modeling in the following years of research.
Another important subfamily of class A GPCRs with a number of key physiologic roles are the Chemokine receptors . So far (till 2020) only 5 different chemokine receptor complexes have had their crystal structure solved by researchers and they are CXCR4  (PDB IDs: 3ODU, 3OE0, 3OE6, 3OE8, 3OE9, and 4RWS [18, 19]), CCR5  (PDB IDs: 4MBS, 5UIW, 6AKX, and 6AKY [21, 22, 23]), US28  (PDB IDs: 4XT1, 4XT3, 5WB1, and 5WB2 [24, 25]), CCR2  (PDB IDs: 5T1A, 6GPS, and 6GPX [26, 27]) and CCR9  (PDB ID: 5LWE ). Structure based drug design was the key in solving crystal structures of Chemokine receptors and its potential is reflected by the large amount of ligands found for various chemokine receptors. SBDD methods prove to be more effective when a crystal structure is available as homology models.
In 2011, Kobilka achieved another break-through when he and his team captured an image of the β-adrenergic receptor at the exact moment that it is activated by a hormone and sends a signal into the cell. This image is a molecular masterpiece . This was the first step in the path that earned Brian Kobilka the Nobel Prize in Chemistry in the year 2012 for his groundbreaking discoveries about GPCRs along with Robert Lefkowitz. In the year 2011 and 2013, the first secretin family GPCR structure was solved (PDB ID: 4L6R, 4K5Y [30, 31]) and in the following year the first glutamate family GPCR structure was deposited in PDB (PDB ID: 4OR2, 4OO9 [32, 33]).
There are various databases available exclusively for GPCR structures like GPCR-EXP [https://zhanglab.ccmb. med.umich.edu/GPCR-EXP/] (database for experimentally solved GPCR structures) and GPCRdb  (web tools and diagrams that aid GPCR research) that profusely help the researchers. According to GPCR-EXP statistics there are 389 structures for 67 GPCRs belonging to different species deposited in the PDB. Figure 2 gives us details about the total number of new experimental structures of GPCRs solved every year as recorded by GPCR-EXP database. There are still many more GPCR structures that are yet to be solved and these remain as an unturned page in the global research of GPCRs.
4. Role of homology modeling in unraveling the structures of GPCRs: a success story
Protein based virtual screening requires knowledge of three dimensional structure of targets. Researchers will have to face an overwhelming number of potential targets like GPCRs for which no or very few experimental 3-D information is available. Therefore, it is crucial in the near future to be able to use not only X-ray or NMR structures, but also GPCR models for protein-based virtual screening of chemical libraries. There are a lot of difficulties in obtaining significant amounts of pure and active recombinant GPCRs and this has been a huge problem in generating a lot of high resolution three dimensional structures of GPCRs. Low resolution GPCR structures of either bacteriorhodopsin or Bovine rhodopsin have paved way for many GPCR models. These models proved to be ineffective as they were not reliable enough for structure-based ligand design. The solving of crystal structure of the inactive dark-state rhodopsin back in the year 2001 was a huge mile stone in the structural study of GPCRs as a number of homology models of other class A GPCRs have been reported since then based on this structure.
Generally the importance of crystal structures is that they are useful to map sequence differences and to help analyze if the ortholog variant may affect the ligand binding and signaling of that particular GPCR. The first prerequisite for experimentally solving a protein structure is obtaining large amounts of stable, purified, homogeneous protein which can be used as templates to build a homology model. By means of in silico methods like homology modeling, crystal structures can be used to predict the effect of such ortholog variants. First step in developing homology models is the alignment of fingerprint motifs that are common among the family which are then are extrapolated to assign coordinates for the entire helical bundle. On the basis of databases of loop conformations and based on the specific application loop regions are either ignored or modeled accordingly . As the template and query sequences used in homology modeling both belong to the GPCR family, the seven transmembrane (TM) helixes were properly transformed in the models according to that of the template structure. The RMSD between the model and the template structure must always preferably be less than range of 3 Å. Further the models were validated with the help of ERRAT plot , PROCHECK  and VERIFY3D .
One of the test case wherein homology modeling proved to be effective with the structural studies of GPCRs is the work done by Bissantz et al. where 3-D models of the D3, β2, and δ-opioid receptors were generated for future agonist screening as already several full agonists were known for each of these GPCRs . Many GPCR models were set up to speculate if the “activated state” of GPCRs was conformationally more flexible than the antagonist-bound ground state. Apomorphine and pergolide (D3 receptor), epinephrine, and nylidrine (2 receptor), SNC-80, and TAN67 (−opioid receptor) were the agonists used for the refinement and two agonist-bound models were built for each receptor. An alternative activated-state model was also generated by substituting the single ligand-biased receptor to do comparative studies. When the amino acid sequences of the target receptors were aligned to the sequence of the Bovine rhodopsin template, the alignment coincided with the known structural features of GPCRs. It was observed that despite the low sequence identity when taking the whole TM sequence in account, the structurally and functionally important amino acids were highly conserved or compensated by amino acids of high similarity. These GPCR models are static though proteins are in reality more or less flexible which gave rise to more problems associated with docking. GPCR models based on a template with an identity of 20–30% can be expected to be of higher accuracy than when modeling other type of proteins based on a template with low-sequence identity and this was proven to be rue in this test case where in the antagonist-bound state models of three human GPCRs were proven to be suitable for virtual screening of GPCR antagonists. Although single template based models were seen to be less reliable. This was because all GPCR models that were used as templates have been derived from the inactive state of Bovine rhodospin, which was closer to an “antagonist-bound state” than to an “agonist-bound state” of the target GPCR and though their active site can be expanded the following conformational changes occurring in the receptor activation process could not be stimulated. A similar unreliability with the single template model was observed with all the GPCR homology models developed based on β2AR. Many models exist for β2AR, some of which have been improved upon with supporting biochemical data. All of these models were more similar to rhodopsin than β2AR. This was mainly because they were all homology models generated from single structural templates. The addition of multiple structural templates and conformational states to the pool of information on GPCRs later paved way to a new generation of more potent therapeutics targeting GPCR family. It is also not conclusive to come to a judgment where this unreliability of single template modeling stands strong as these modeling were conducted at a time when there were only few templates available. Judith Varady et al. used Bovine rhodopsin template to build the model of dopamine 3 (D3) subtype receptor which is a promising lead in treating drug addictions. The transmembrane helical region of the D3 receptor was modeled using Bovine rhodopsin template includes the ligand-binding site and showed sequence identity in the twilight region during homology modeling (sequence identity of 28%) .
Three-dimensional model of the human CCR5 receptor was developed by Fano A. et al. using a homology-based approach starting from the X-ray structure of the bovine rhodopsin receptor . The reliability of these models was accessed using molecular docking and molecular dynamics studies. During this work there was no experimentally solved three dimensional chemokine receptor structures available and hence became a major hurdle in the deeper researches on the structural properties of these receptors. Therefore main ways to investigate the properties of CCR5 were homology modeling studies along with site-directed mutagenesis (SDM). Therefore a new model of CCR5 was built after consolidating all the information from the previously built models and also incorporating extensive molecular dynamics simulations (MD). Furthermore, flexible docking of a synthetic antagonist TAK779 and a novel docking protocol for natural agonists RANTES and MIP-1β was employed to develop the CCR5 models. The first crystal structure of bovine rhodopsin by Palczewski et al. served as the perfect template to build this model as the sequence identity increased to ∼30% from previously being less than 20% when considering only the transmembrane helices (TMHs), and several of the amino acid residues essential for maintaining CCR5’s architecture and receptor function were highly conserved. Pair wise alignment between the template and human CCR5 was carried out in CLUSTAL W and it was found that the anti-parallel β sheet loop of the second extracellular loop (ECL2) had higher sequence homology to the template. Out of the four cysteines which form two disulfide links in CCR5, Cys101-Cys178 had the anti-parallel β sheet loop of ECL2 and thus this loop was constructed by homology from the template structure using MODELER 6.2 . The Cα Cartesian coordinates of the seven transmembrane helices and ECL2 were copied from the corresponding template (PDB: 1F88) and the N-terminal domain and the remaining loops were built de novo using MODELER 6.2. Confirmations of the models were done using PROCHECK and were selected as the input structure for MD Loop Refinement. The resulting model consisting of the TMHs and all ECLs and ICLs, was validated by MD conformational analysis, which showed it to be consistent with the then currently available SDM data and was used to gain insights into the molecular basis of the initiation and development of HIV-1 infection. This information could be useful in the rational design of HIV-1 entry blockers.
Chronologically, the time when the structural information about chemokine receptors was unavailable Gugan et al. in 2012 carried out the investigations on the binding site of CCR2 . A comparative model was generated using the template structure of CXCR4 (PDB ID: 3ODU ). The structure of CXCR4 (PDB ID: 3ODU) was elucidated in 2010. One of the key findings along with the binding site residues is that the disulfide bridge was produced between Cys113-Cys190 of the selected CCR2 model and was also later observed in the crystal structure which was elucidated in 2016 (PDB ID: 5T1A ).
In the similar manner, Changdev et al. in 2013 developed the 3-D model for CCR5 using the template CXCR4 (PDB ID: 3ODU; resolution 2.5 Å) modeled by MODELER 9.2  to explore the biding site of the receptor . Significantly, the modeled structure coincides with the crystal structure of the CCR5 (PDB ID:4MBS ) whose structural information was determined by Tan et al. in 2013 .
The research by Anand et al. in 2011 on the accuracy of homology modeling revealed the comparison study between the reported models along with the crystal structure of CCR5 (PDB ID:4MBS). The findings have identified the importance of multi-template model in determining the insights of structural information of the receptor possessing its own merits and demerits. The inhibitor Maraviroc was docked to the single template and multi-template models of bovine rhodopsin (PDB ID: 1F88), β2 adrenergic receptor (PDB ID: 2RH1 ) and CXCR4 (PDB ID: 3ODU). The critical salt-bridge interaction established by Maraviroc with Glu283 of the receptor was genuinely observed in modeled structure and crystal structure.
In the process of building model of a particular GPCR usually many models are constructed with varying side chains and almost identical backbone. This is done to check which model among all the constructed models shows maximum affinity towards various ligands. So the model showing consisting binding mode is selected for further analysis. An example of this is the study done by Mateusz N et al. where 400 homology models of serotonin 5-HT1A receptor, one of the most documented monoamine GPCR, was modeled using Modeler 7v7  with the crystal structure of bovine rhodopsin (PDB:1F88) as template . These models varied considerably in their side chain but the polypeptide backbone varied only marginally from the template. Arylpiperazines test ligands were docked to all the 400 models with default parameters without any constraints. A detailed analysis of the docking poses revealed intrinsic information about crucial ionic bonds that were formed d almost exclusively in the case of receptors with the gauche(−) conformation of the Asp3.32 ø1 angle. Such insights led to the development of 200 new homology models with all the changes incorporated. Molecular docking was once again done on all the 200 new models and the complexes were scored using various scoring functions to choose the best models.
The past few years have seen remarkable advances in the structural biology of G-protein coupled receptors (GPCRs) and separate databases exist to study GPCRs. The applications of structural studies of GPCRs have various goals and these goals trigger myriad scientific investigations. For the GPCRs whose structures have now been solved, the homology models developed earlier based on rhodopsin, have been the first step in discovering the versatility of their structural studies. Due to the increase in the available GPCR structures, the templates used to build the structure for homology modeled GPCRs show a drastic increase in similarity and query coverage in the recent years. This enhances the structure of the models which are being constructed with the upcoming elucidated structures of GPCRs.
The research in GPCRs is a global phenomenon and this is possible only if we have structural insights based on structural studies of GPCRs. Owing to the difficulty in crystallizing the GPCRs, it was once construed that structural studies of GPCRs were impossible. But with the technological advancements in the computational techniques, building a model structure based on the homology of a particular receptor with a template structure became possible. Thus homology modeling and models generated via tools like MODELER unraveled the unexplored arenas in the research of GPCRs. These models served a greater purpose to the pharmaceutical industries wherein GPCRs became famous drug discovery targets. The many experimental structures constructed using previously solved structures as templates were further scrutinized based on their efficiency in showing a consisting binding mode with various ligands. Recent times have seen use of Cryo EM techniques in solving structures of GPCRs. But still contribution made by techniques like homology modeling in the structural studies of GPCRs will always remain as a mile stone.
Conflict of interest
The authors declare no conflict of interest.