A Brief View of Molecular Modeling Approaches to P2 Receptors

Purinergic receptors are a class of receptors distributed into two groups, P1 and P2. P1 receptors are activated by nucleosides, like adenosine, while nucleotides active P2 receptors. In turn, P2 receptors comprise two families, metabotropic P2Y and ionotropic P2X. P2Y receptors consist in eight members, namely, P2Y1, P2Y2, P2Y4, P2Y6, P2Y11, P2Y12, P2Y13, and P2Y14, described in mammals, while P2X includes seven members, numbered P2X1 to P2X7. These receptors have been described as expressed in practically all cells studied to date. In this context, P2 receptors are suggested as participating in certain diseases. The general approach applied in the discovery of new drugs is expensive and lengthy. Alternatively, in the last 20 years, molecular modeling has emerged as an exciting tool for the design of new drugs, in less time and at low costs. These tools allow for in silico testing of thousands of molecules against a target protein, as well as toxicity, absorption, distribution, metabolism, and constant affinity predictions of a given interaction. Thus, molecular modeling algorithms emerge as an increasingly important tool for the design of drugs targeting purinergic receptors as therapeutic targets of many diseases, including cancer, pain, inflammation, cardiovascular, and endocrine conditions.


Introduction
Purinergic receptors are a class of transmembrane proteins activated by extracellular nucleotides and nucleosides. They consist in the P1 receptor (mainly activated by adenosine) and P2 receptors distributed into two families, P2Y (G proteincoupled receptors) and P2X (ionotropic receptor).
P2Y receptors comprise eight members in mammals (P2Y1, P2Y2, P2Y4, P2Y6, P2Y11, P2Y12, P2Y13, P2Y14) and are activated mainly by adenosine diphosphate and uracil nucleotides, such as UTP and UDP. Regarding distribution, P2Y receptors are widely distributed throughout the organism. The structure of P2Y receptors comprises seven transmembrane segments with an extracellular amino terminal, an intracellular carboxyl-terminal, and three segments connecting the transmembrane domains, and two subtypes have been resolved by X-ray crystallography, P2Y1 and P2Y12 [1-3].
The first resolved P2Y structure was P2Y12 in 2014, in two papers published by Zhao group describing the crystal structure (Figure 1) of the receptor bound to an antithrombotic drug and two agonists (2MeSADP and 2MeSATP) [1,3]. As this receptor is related to thrombosis, with at least three drugs available in the market, this finding was relevant to address the development of new antithrombotic drugs. Surprisingly, the authors also reported a different behavior of the receptor linked to nucleotide and non-nucleotide ligands, presenting different binding pockets that share only a small portion of the site.
Another subtype that has been resolved by X-ray crystallography is P2Y1 that plays a similar role in thrombosis, facilitating platelet aggregation, and that could be a new candidate for the discovery of antithrombotic treatments. In this case, the resolution was even higher, at 2.2 Å, than P2Y12. The authors described two distinct binding pockets, one allosteric and another orthosteric. The allosteric site is located close to the lipid bilayer, while the orthosteric site is formed by an extracellular loop-2, VI and VII helices, and a portion of the amino-termini domain. It is important to note that the binding modes are entirely different from that reported for P2Y12.
Regarding P2X receptors, these are nonselective cation channels gated by extracellular ATP, also expressed in various systems throughout the organism [4-6]. To date, seven different P2X subtypes (P2X1-P2X7) have been found, assembled as homo-or heterotrimers with two transmembrane segments, an extracellular ligand binding loop and both intracellular amino and carboxy termini that modulate channel gating [7,8]. In general, the P2X receptors when activated open a channel permeating mono and divalent cations except for the P2X5 that permeates anions as. Another receptor that has a different behavior is the P2X7 that opens a channel activated by low concentrations of ATP, but under high concentration stimulation (>100 μM), it opens a pore, allowing the passage of molecules up to 900 Da, including fluorescent molecules [9][10][11]. With the recent crystal structure determination of some P2X receptors, some of the questions and hypothesis gathered in previous studies were confirmed, and it was also possible to postulate a reasonable mechanism for channel activation [8]. However, the molecular machinery involved in the high-conductance pore formation remains uncertain and under discussion.
A significant advance in the understanding of P2X receptors was the determination of the crystal structures of zebrafish P2X4 (zfP2X4) in both the apo-(closedchannel state) and ATP-bound (open-channel state) forms ( Figure 2) [12]. Before this, assumptions concerning the receptor structures were carried out through bioinformatics and molecular and biochemical studies, since overexpression, reconstitution, and the overall structure determination of membrane proteins are challenging [13]. In addition, no sequence homology exists between P2X and other ion channels [14]. Therefore, the crystallographic structures of zfP2X4 demonstrated the particular three-dimensional organization of the P2X family and the folding of each subunit, marking a new era for P2X studies.
The architecture of P2X receptors is compared to a dolphin [8] (Figure 1), where the highly conserved residues, found in a rigid β-sheet structure, form a sizeable glycosylated ectodomain denoted as the dolphin body. Four other flexible domains make up the head, dorsal fin, and left and right flippers. The TM1 and TM2 transmembrane domains (dolphin tail) are organized into alpha helices, and the ion-conducting channel is formed by the association of three of the TM2 alpha helices. It was also possible through the zfP2X4 crystal structures to confirm the ATP-binding site region [12]. The residues are located within a cavity surrounded by the head, left flipper, and dorsal fin [8]. Besides, two possible ion pathways were recognized. Ions theoretically either pass through the channel vertically, through three related vestibules positioned along the central axis of the receptors, or laterally through three fenestrations positioned above the gate region, located approximately halfway down the length of the TM2 helix [8].
Over the years, crystal structure determination and in silico modeling allowed for inferences regarding gating action. Gating may begin with ATP binding that incites conformation changes all around the receptor and the contraction of regions surrounding the ATP-binding site. Subunits bend outside and rotate, flexing the lower regions of the receptor and the extension of the lateral fenestrations. Since the lower areas of the receptor are connected to the transmembrane domain, their movement displaces the outer ends of the TM helices, followed by pore opening [15].
Recently, other P2X crystal structures have been determined, such as the Gulf Coast tick P2X (gcP2X) receptor [7], the human P2X3 (hP2X3) receptor [16,17], the panda P2X7 (pdP2X7) receptor [18], and the chicken P2X7 (ckP2X7) receptor [7]. Surprisingly, it has been demonstrated that some antagonists thought to be competitive antagonists were, in fact, allosteric antagonists. These structures led to an understanding concerning channel activation, desensitization, and the recognition of P2X family agonists and antagonists. On the one hand, the elucidation of the various P2X structures opens the possibility of structural comparisons between subtypes. For instance, the ckP2X7 conformation was noted as presenting an extended, incompletely activated conformation of the channel and a different ATP recognition mechanism compared to other P2X crystal structures [7]. On the other hand, the availability of crystal structures in both open-and closed-channel states also increases the use of computational methods to predict P2X receptor structures from sequences and aid in drug design studies targeting the P2X family.

In silico approaches applied to drug discovery
The classical pharmacology approach to study protein and ligand interactions searching for new drugs based on high-throughput screening with thousands of molecules takes up significant amounts of time and financial resources. The computational approach, on the other hand, allows for substantial overall cost reductions.
In silico methods comprising pharmacophore modeling, homology models, quantitative structure-activity relationships, molecular docking, or databases are mostly used alongside other experimental techniques, such as in vitro methodologies, creating models to test, discover, and optimize molecules to attribute a pharmacological target. The in silico approaches to drug discovery or computational pharmacology and therapeutics comprise biomedical data for simulations and predictions, suggesting hypotheses and aiding in creating shortcuts between the idea and the designed drug, whereas all new drug design process takes about 15 years [19].
In silico approaches alter the way the pharmacology industry discovers and optimizes drugs, using human genome information. The genomic sequence emerges with confidence in biological and chemical sciences [20,21]. The various genesequenced species to date, including humans, have contributed to increasing all databases and databanks worldwide. Besides, the new biochemistry challenge is to apply this information to solve the 3D structure of macromolecules, mainly proteins (i.e., enzymes, receptors). As these structures order the major functionality of the human body, reading, solving, and interpreting this structural information may be the future of medicine. Elucidating protein structures opens up possibilities for drug research using the solved 3D structures as molecular targets [22]. However, although available gene and protein information exist, a knowledge gap is also observed [23]. In this context, structure-based methods are frequently used to predict or optimize new drugs, enabling both improvements and predictions concerning pharmacodynamic (bioactivity and toxicity) and pharmacokinetic (absorption, distribution, metabolism, and excretion) properties [24].
However, each approach was developed for a specific proposal, and none was capable of solving all issues. In this sense, over time many researchers have noted that mixed strategies can be used as complementary, filling the gaps present in each method. This point of view emerges to aid in the study of human P2 receptors, as no experimental 3D structure is available. In this sense, comparative modeling based on homologous template proteins associated with molecular docking and a virtual screening (VS) approach is a standard protocol for computer-aided drug design. The following section details some procedures applied to computational structureactivity relationship studies.

Comparative modeling
The experimental determination of a protein structure is frequently delayed by difficulties in the cloning, expression, and purification of the target protein. Additionally, technical and methodological problems concerning crystallization can also occur, which may nullify the effort to obtain a crystal structure. Therefore, methods that deal with structure prediction are highly sought out. One of these methods is termed comparative modeling, also known as homology modeling. This method is based on the concept that two proteins from the same family, which share similarities in their amino acid sequence, will display similar three-dimensional structures, as the degree of conservation of three-dimensional structure elements in a protein family is higher than sequence conservation.
The precision of comparative modeling relies on the sequence alignment between what is called the template (related homologous protein) and the target sequence (protein to be constructed) [25]. Usually, the sequence identity must be higher than 70% to ensure a successful prediction, as sequence identities lower than 30% may produce models presenting inaccurate structural predictions. Membrane proteins, especially transmembrane domains, can be constructed into acceptable models with sequence identities equal or higher than 30%, although the alignments in soluble regions may be less accurate [26]. In addition to sequence identity, inherited structural deficiencies, such as crystal packing and the presence of detergents for solubilization from the template structure, may also contribute to an unreliable predicted model. Popular comparative modeling programs include MODELER, SWISS-MODEL, and ROSETTA [27][28][29].
Predicted receptor structures for P2X1, P2X2, P2X3, P2X4, and P2X7 have been constructed in several studies (see more details in the review by Grimes and Young) [30]. These structural studies provided evidence to the identification of the critical amino acid residues and regions of the P2X receptor function have been previously investigated only by experimental assessments. In addition, comparative modeling has led to interesting developments regarding modified P2X receptors, by point mutations in important residues. Mutations in crucial residues can introduce different mobility concerning channel opening and closing, as well as different interactions between the receptors and ATP [8]. For example, the substitution of glycine, located in the lower body, to alanine in the P2X4 receptor resulted in a more rigid structure, decreased ATP sensitivity, slower activation, and desensitization [31].

Molecular docking
One of the first steps in a computational drug discovery campaign is molecular docking. Molecular docking, or simply docking, is a computational method that predicts the favored binding mode of one molecule to another when they may form a complex. Usually, the binding molecule, known as a ligand, has its rotational or translational space probed, while the other particle, known as the receptor, remains rigid. A receptor structure can originate from modeling or experimental approaches, such as comparative modeling and crystallographic structures. Several docking programs and tools are available for both academic and commercial uses, such as AutoDock [32], DOCK [33], Glide [34], and GOLD [35]. Docking programs consist of a search algorithm and score functions that return a conformation of the ligand, known as a pose.
In the algorithm search, the ligand conformation is evaluated recursively until convergence to minimum energy is reached [31]. In docking methods, the energy is estimated by a scoring function that ranks the conformations, usually by the sum of electrostatic and van der Waals energies. A docking run does not require a lot of computational power, taking from seconds to a few minutes at most to detect a conformation [36]. Although docking approaches are fast, one of the crucial operational issues is obtaining accurate results due to flexibility and score function limitations [36]. Therefore, a docking result should not be viewed as a drug, but just as a priority candidate for optimization and experimental testing.
With the existence of known P2X receptor three-dimensional structures, searching for compounds that selectively activate or block specific P2X receptor subtypes indicates promising therapeutic targets for many diseases, such as cancer, pain, inflammation, cardiovascular, and endocrine conditions [37]. To date, numerous P2X ligands have been described, mostly derivatives from the endogenous agonist ATP developed to increase ATP stability and reduce its degradation rate by extracellular ectonucleotidases [38,39]. Other structural compound classes, such as nucleotide derivatives, irreversible antagonists, suramin-like analogs, and pyrimidine or purine derivatives, have been reported [37]. Therefore, strategies that predict the druggability of a compound to a target and, in turn, contribute and accelerate optimization are frequently applied in drug discovery campaigns. Recently, a study performed an effort in the field of structure-based modeling and the understanding of interactions between compounds with drug-like properties and P2X receptors. Dal Ben and colleagues used comparative model structures for rP2X2 and hP2X2, using the zP2X4 crystal structure as a template, to perform docking experiments concerning ATP and ATP analogs as P2X agonists in the ATPbinding site [37]. In addition, they also assessed nucleotide and non-nucleotide antagonists (a P2X3 antagonist, suramin derivatives, anthraquinones, azoles, and cyanoguanidine derivatives) in different binding sites. The docking results indicated agonist interactions, which may act in a comparable way to endogenous ATP [37]. It is also clear that, alongside mutagenesis analyses, comparative modeling and docking can provide key residues regarding P2X receptor agonist and antagonist potency and selectivity.
This technique was also applied to test specific ligands that already have been tested on the wet bench. Some authors demonstrated that, in a P2Y1 model based on the X-ray crystallography-resolved rhodopsin structure, some antagonist affinities were very similar to that obtained in experiments [40]. Other P2Y member models have been described based on other G protein-coupled receptors, to discover new antagonists; see more details in Jacobson et al. [41].

Virtual screening
Molecular docking experiments of a compound series are important and enlightening. However, in silico screening of large chemical libraries may improve the chance of finding candidates for optimization. Therefore, virtual screening is the computational and low-cost counterpart of time-consuming high-throughput screening strategies [42]. Virtual screening can be structure-based, using molecular docking as a method, or ligand-based. Compound libraries to be screened can be formed by thousand to millions of compounds presenting similarities, or not, to known target ligands. However, only a small percentage of compounds from the top-ranking conformations can be examined for interaction patterns and prioritized to be purchased or synthesized [42].
Regarding the P2Y family, a study applying virtual screening was carried out by Costanzi et al.
[43], who tested 250,000 molecules for P2Y1. At the end of the screening, they selected only 110 hits clustered in 59 groups. The substances presented antagonistic behavior in a low molar range; however, compound optimization was still required [43].
With the help of structure-based virtual screening, Caseley and co-workers were able to identify three novel hP2X7 antagonists with micromolar potency (IC50 < 6 μM) from a library of over 100,000 structurally diverse ligands [44]. VS was performed in the hP2X7 ATP-binding pocket (enclosing 10 Å of the site), and 42 of the highest-ranked scores were commercially available to be tested. Ligand similarity was also considered and used in inhibition tests. From functional testing, one of the three compounds showed favored inhibition of the large pore formation without consequence to ion channel function. Studies such as this suggest that P2X receptors are indeed an attractive pharmacological target and that computational finds can corroborate experiments and vice versa.

Molecular dynamics
In their most basic application, structure-based methods rely on a single "snapshot" of the target protein. This factor may be due to the lack of experimental or reliable modeled structures or just the reduction of docking and virtual screening computational demands [45]. However, with the recent progress in computational power, methods that account for structural flexibility can replicate in solution dynamics. One of the most applied methods to investigate protein motion, with or without a ligand, is known as classical molecular dynamics simulation. Molecular dynamics (MD) comprises simulations based on Newton's motion equation and can designate the progress of the conformational state and energy landscape as a function of time in a feasible time scale [42]. Classical MD is used because it includes temperatures and pressure effects, provides an alternative interpretation for experiments, aids in obtaining molecular level information, and is computationally cheaper than other MD approaches, such as quantum mechanics/molecular mechanics (QM/MM) or quantum chemistry (MD/QC). One of the reasons for this is that QM methods use the Schrodinger equation solution to represent the electronnuclear relation considering static nuclei, since in classical MD atom nuclei use an average field to describe the electrons [46-48].
Despite the advances in computational power, molecular dynamics simulations still present a limited time scale, ranging from hundreds of nanoseconds to hundreds of microseconds depending on the system, making its length not enough to sample most biological processes. In the case of transport through membranes or channels, the massive, conformational changes during gating are complicated and time-consuming, due to systems with over 200,000 atoms (receptor structure, water molecules, lipid bilayer, and neutralizing ions) accomplishing, at most, hundreds of nanoseconds [49]. However, even with limited simulation times, interesting structural information can be obtained. For instance, regarding a molecular dynamics simulation of the crystal structure of zfP2X4, it was possible to infer that ATP binding may terminate hydrophobic interactions between the left flipper and the dorsal fin, producing a downward movement of the left flipper and upward motion of the dorsal fin [50]. Moreover, molecular dynamics predicted ion passage through the lateral fenestrations, which were confirmed later in cysteine accessibility assay experiments [12,51].
Nevertheless, molecular dynamics assessments have only been employed in a few studies. Alternatives to sidestep time scale and length shortcomings in the ion channel, such as coarse-grained simulations and enhanced sampling methods (replica-exchange molecular dynamics, metadynamics, and simulated annealing), are performed instead of classical molecular dynamics. In coarse-grained methods, a system is represented by a reduced number of degrees of freedom, thus eliminating some interactions and allocating fewer resources than all-atom representation, while enhanced sampling methods can help the system cross specific high-energy barriers that separate low-energy conformations. Recently, a coarse-grained simulation of a model structure of rP2X2 within a lipid bilayer was performed [44], and the authors observed that lipids were interposed between the transmembrane helices during the simulation, thus indicating that these molecules may be able to stabilize the open receptor state [30]. All data about the results concerning the techniques described above (comparative modeling, molecular docking, and molecular dynamics) could be seen in Table 1.

Artificial intelligence applied to drug discovery
Artificial intelligence (AI) is considered the hallmark of The Fourth Industrial Revolution [52]. AI comprises several definitions but can be defined briefly as a part of computer science capable of performing activities ascribed to human intelligence, such as solving problems, with similar and, in some instances, superior cognitive abilities, for instance, in some cases involving image recognition and playing games, such as Go, Chess, and Shogi [53]. It is important to point out that the game tree complexity of Go is 10 360 [54]. AI is implemented through machine learning (ML) algorithms to be used in different fields.
Currently, ML tools have been used to rapidly identify biologically active molecules from libraries containing thousands of substances in a less costly manner compared to a bench approach [55, 56]. As mentioned previously, several ML algorithms are available, such as k-nearest neighbors, Naïve Bayesian, decision trees, deep neural networks, support vector machines (SVM), and random forest, although the two most applied in this decade which aimed at drug discovery have been SVM and deep neural networks [57]. SVM was developed by Cortes and Vapnik, presenting applications in several fields [58]. Essentially, a hyperplane (in a Cartesian plane, for instance, with one variable per line) separates two samples by attempting to find the maximum distance between data points and the hyperplane. The closest points in each side are termed support vectors. An excellent review on SVM applied to drug discovery has been recently published by Maltarollo [59]. SVM can predict interactions between ligand and receptors based on physicochemical features and protein and compound descriptors, regardless of structural information.
Artificial neural networks are a mathematical representation of neurons, based on how they work in the animal brain, containing several hidden layers, in some cases. Several subtypes have been noted, but a central character is to train the network to learn specific task patterns. In other words, the algorithm "writes" itself through modifications, applying certain variables such as weights and bias in the neural network. The work of a neural network is similar to the brain plasticity that occurs in animals in order to learn new tasks, where new synaptic contacts are formed, producing other biological networks. An interesting review of this issue has been recently published by Carpenter [57]. Several subtypes of artificial neural networks have been applied in drug discovery [57].
Nevertheless, the future is promising, since several drugs have been developed applying ML. Besides, the new Google AlphaZero program has learned to play several games by self-playing and has defeated machine learning programs such as AlphaGo and AlphaGo Zero, which had previously defeated human Go champions 9 [53, 55, 60,61]. To date, no medicine has been produced with the aid of machine learning algorithms, and no studies on P2 receptors are available.

The in silico approach: putative applications and disadvantages
Bioinformatics is a multidisciplinary field that comprises knowledge on chemistry, biology, and physics. This type of approach has emerged as a prevalent tool in some areas of research, mainly those engaged in the discovery of new compounds. The computational method user includes pharmaceutical industries that rather than spending money and time on the selection and optimization of synthetic molecules would be able to test several parameters of thousands of substances, at low costs [62][63][64]. The development of a new compound takes from 10 to 16 years comprising the initial steps of preclinical studies (also using a lot of test in animals) and, Table 1. P2 receptors and In Silico approaches widely applied to elucidate tridimensional strutuctures, interactions, and binding mechanisms between P2 receptors and their suitable ligands.
later, clinical trials, which cost about 1.2 billion, with the added factor that the molecule can still fail during any phase of the trial. Thus, computational approaches can be a low-cost option to test, at atomic resolution level, hypothesis that might require costly lab experiments.
Computational approaches also have their disadvantages. In general, in silico methodologies work with the concept of models. Models are an approximation of reality; thus, they give only partial information of the whole biological system. For example, structure-based methods, such as comparative modeling, molecular docking, molecular dynamics, and other mentioned methods, often rely on threedimensional models from previously experimentally determined structures. However, experimentally determined structures also carries out uncertainties introduced during the transformation from observed electron density data to an atomic level structure. The position of protein atoms, ligands, water molecules, and other cofactors might be affected by crystallization conditions, and the difficulties in using a physiologically relevant environment might lead to ambiguous conclusions in the structure determination. Therefore, the accuracy of structure-based methods might be questionable due to these unreliable models.
Approximation of reality is also present in software's algorithms. Molecular docking, for instance, often disregards protein flexibility to speed the process, and their ability of predicting binding affinities is limited by their small derivation/ training set used by the scoring function. Also, molecular dynamics have limited time scale and sampling, and the physical models (force fields) used are not intended for all types of biologically appropriate molecules failing in reproducing properties under different temperature, pressure, pH, ion concentration, and solvent types. Some approximations can be overcome by increasing computational power and using of graphical units; however, this might go against the cost efficiency promoted by means of in silico approaches, since it might be expensive to build and maintain a cluster of computers or supercomputers.
In relation to the P2 receptors, a lot of data has been generated by computational approaches (or molecular modeling), even describing some features not detailed by common molecular and structural techniques as the passage of the ions through the lateral fenestration and more recently the docking of some molecules inside the pore of the P2X7 suggesting an allosteric pocket of binding molecules [72]. Although there is no compound approved by FDA targeting P2 receptors derived from molecular modeling, various compounds are under investigation in the preclinical phase.
So, synthetic engineering is only in the beginning and will, in a very close future, aid in facilitating an enormous number of processes in biology and other areas.

Conclusions
Bioinformatics is a growing subject of study that was incorporated in various fields of knowledge as engineering, physics, chemistry, and biology. This growth was strengthened by hardware and software improvements, mainly in the opensource software. In biology/chemistry, the most used is the molecular modeling, a field of science that encompasses all approaches, mimicking and modeling the behavior of molecules. The pharmaceutical industry has added routinely the in silico experiments to select hits and subsequently new drugs. And in the future, it seems that these techniques will be routinely applied in other fields, facilitating results and saving time and cost. Moreover, the artificial intelligence development could be directed to retouch weak point in molecular modeling approaches. Notwithstanding, the new age began in which every process will be facilitated by machine learning algorithms.