Experimental and theoretical structural parameter of the 5-nirofuran-2-aldoxime.
In this chapter, firstly, we briefly review aspects of the approximation of quantum chemistry, molecular electrostatic potential (MEP), and chemometrics techniques, which are accredited as important tools in the development of chemical science and are frequently used in the study and design of bioactive compounds. Ultimately, we use MEP and pattern recognition (PR) techniques as tools to design nitrofuran compounds with biological activity against Trypanosoma cruzi (T. cruzi). PR models (PCA, HCA, KNN, SDA, and SIMCA) were constructed and demonstrated that 23 nitrofurans can be classified into two classes or groups: more active and less active according to their degrees of activity against T. cruzi. Properties such as charge on the N atom of the nitro group (QN1); the difference between the highest occupied molecular orbital (HOMO) energy and the lowest unoccupied molecular orbital (LUMO) energy (GAP energy); molecular representation of structure based on electron diffraction code of signal 5, unweighted (Mor05u); and Moriguchi water–octanol partition coefficient (MlogP) are responsible for the classification into more active and less active studied nitrofurans. It is interesting to notice that these properties represent three distinct classes of interactions between the nitrofurans and the biological receptor: electronic (QN1 and GAP energy), steric (Mor05u), and hydrophobic (MlogP). The results of the application of PR models on the validation set evidenced two nitrofuran compounds (compounds 25 and 30) as more promising for synthesis and biological assays, which in the future can be used to validate our PR models.
- molecular electrostatic potential
- chemometric techniques
- pattern recognition techniques
- design of bioactive compounds
Reports of theoretical bases of MEP and the development of efficient computational methods state that MEP has become an important reactivity index in studies of a large variety of molecular interactions . The usefulness of this theoretical approach in studies and interpretation of chemical, biochemical, and related phenomena is well documented [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18].
Chemometrics is a discipline that collects mathematical, statistical, information theory, and computer science tools to deal with complex chemical data [19, 20, 21, 22]. PR techniques were introduced in the chemistry, at the beginning of the 1970s, to analyze various types of spectroscopic data. Since then, PR became part of chemometrics and has been an excellent tool to aid in the interpretation of chemical data to obtain relevant information in different application sectors of chemical science [19, 20]. PR techniques are especially useful for the classification of objects into discrete classes on the basis of measured features. A set of characteristic features of an object is considered as an abstract pattern that contains information about a not directly measured property of the object .
The MEP and PR techniques have been used as independent strategies in the study of active compounds and lead to the proposal of new molecules for synthesis and biological testing. The joint applications of these powerful tools were described carefully, to unravel the structure-activity relationship of bioactive compounds, consequently proposing new molecules. Therefore, a more intense exploration of its potentials is needed in order to design biologically active compounds.
The design of molecules with a desired property is one of the objectives of chemoinformatics. In this chapter, we present a study of the application of MEP and PR techniques to design nitrofuran compounds with potential activity against T. cruzi. In the first step of our study, MEP maps will be used in an attempt to identify the key structural features of nitrofuran compounds that are necessary for their activities and investigate their probable interactions with a molecular receptor through recognition in a biological process. Subsequently, PR techniques are used to construct models that will be applied later to a forecast set constructed with the accumulated perceptions in the MEP studies.
2. MEP and chemometrics techniques as tools for the design of bioactive compounds: a brief review
According to the literature, MEP [1, 3] has been a tool of quantum chemistry used by researchers for several decades to study and understand the relationships between structure and activity of molecules. Among the papers that point out the importance of this tool in the matter, and consequently in the planning of bioactive compounds, we can mention those reported by Bernardinelli et al.  and by Jefford et al. .
Another tool, in the form of a set of techniques has been used emphatically over the years in the understanding of the structure-activity relationship of molecules is Chemometrics [25, 26, 27]. This set of techniques has also enables the planning of new biologically active compounds, and most of the developed research is focused on the construction of QSAR (quantitative structure-activity relationship) models.
The combination of MEP and chemometrics as tools for designing new bioactive compounds has almost always been focused on the elaboration of quantitative models, for example, the CoMFA methodology . This methodology was developed in the late 1980s by Cramer et al. . Its application is richly extensive and recently it has been used in several studies of structure–activity relationships of bioactive compounds. Chatbar et al. conducted a study of triazine morpholino derivatives as mTOR inhibitors for the treatment of breast cancer . Pourbasheer et al. performed 3D-QSAR and 2D-QSAR analyses on the series of compounds hepatitis C virus NS5B polymerase inhibitors . Cramer applied the CoMFA methodology for a large majority of 116 biological targets and obtained acceptable 3D-QSAR models . Cramer et al. introduced in the literature a novel alignment methodology for training or test set structures in 3D-QSAR . Dong et al. performed QSAR analyses of aromatic heterocycle thiosemicarbazone analogues for finding novel tyrosinase inhibitors . Dong et al. built 3D-QSAR models of dabigatran analogues as thrombin inhibitors . Ding et al. performed 3D-QSAR models of 6-aryl-5-cyano-pyrimidine derivatives to explore the structure requirements of LSD1 inhibitors .
Applications of MEP to investigate the key features of compounds that are necessary for their biological activities and thus proposing new derivatives as well as the construction of chemometric models as indicative of the most promising among the new derivatives for syntheses and biological assays were reported by us in literature [37, 38, 39, 40, 41, 42, 43]. Pinheiro et al. stated the use of MEP and partial least squares regression (PLS) method in the design of new artemisinin derivatives with activities against Plasmodium falciparum . Cardoso et al., using MEP maps and multivariate QSAR, designed new artemisinin derivatives with antimalarial activity . Ferreira et al., through MEP maps and multivariate analysis, designed antimalarial artemisinins . Figueiredo et al. designed new derivatives of dispiro-1,2,4-trioxolones with activity against falciparum malaria . Carvalho et al., through maps of MEP and pattern recognition methods, proposed new artemisinin derivatives with activity against Leishmania donovani . Barbosa et al. used MEP maps and pattern recognition techniques to plan new derivatives of artemisinin anticancer HepG2 . Cristino et al. proposed new derivatives of 10-substituted Deoartemisinis with activity against P. falciparum  through the use of MEP maps and pattern recognition techniques.
3. MEP and PR techniques as tools to design nitrofuran compounds with biological activity against T. cruzi
3.1.1 Biological recognition process ligand/receptor through the molecular electrostatic potential
The MEP is also suitable for analyzing processes based on the “recognition” of one molecule by another as in drug-receptor and enzyme-substrate interactions, because it is through their potentials that the two species first “see” each other [2, 3, 44, 45, 46].
MEP for the electronic density is a very useful property for understanding the site of electrophilic attack and nucleophilic reactions as well as the hydrogen bonding interactions . The MEP at a given point (x, y, z) in the vicinity of a molecule is defined in terms of the interaction energy between the electrical charge generated from the molecule’s electrons and nuclei and a positive charge test (a proton) located at . Being a real physical property, MEP can be determined experimentally by diffraction or by computational tools . For the studied nitrofuran molecules, the MEP values were computed through Eq. (1) 
where K is the number of nuclei with charges Zj, located at position Rj and ρ () is the electronic charge density. The first term on the right side of Eq. (1) represents the contribution of the nuclei, which is positive; the second term brings in the effect of the electrons, which is negative. In the investigation of the reactive sites of nitrofuran compounds, the MEP was evaluated through of the HF/6-31G method.
3.1.2 RP techniques
In this section, we will make a brief presentation of the PR techniques used in this chapter. A deeper and detailed description of these matters can be found elsewhere [47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66].
188.8.131.52 Principal component analysis (PCA) technique
When computing large multivariate data, it is mandatory to find and reduce unknown data trends using exploratory tools. The main idea of the PCA technique is to reduce the dimensionality of a data set consisting of large numbers of interrelated variables while retaining the variation present in the data set as much as possible. This can be achieved by transforming them into a new set of variables, the PCs, which are uncorrelated and ordered so that the first few retain most of the variation present in all of the original variables. As the final result, the PCA technique performs the selection of a small number of variables (molecular properties) considered better related to the dependent property or feature , in this study, the biological activity against T. cruzi.
184.108.40.206 Hierarchical cluster analysis (HCA) technique
This technique has become, together with PCA, another important tool in pattern recognition . The purpose of using it is to display the data in such a way as to emphasize its natural clusters and patterns in a two-dimensional space. The results are presented as dendrograms. In HCA technique, the distances between objects or variables are calculated and computed through the similarity index which ranges from zero, that is, no similarity and large distance among objects, to one, for identical objects.
220.127.116.11 K-nearest neighbor (KNN) technique
The KNN technique  classifies the objects based on distance comparison among them. The multivariate Euclidean distances between every pair of objects with known class membership are calculated. The closest K objects are used to build the model. The optimal K is determined by cross-validation applied to the training set objects. The classification of a test object is determined based on the multivariate distance of this object with respect to the K objects in the training set. In this technique no assumption is made about the size and shape of the training set classes.
18.104.22.168 Stepwise discriminant analysis (SDA) technique
This technique separates objects from distinct populations and allocates new objects into populations previously defined. It uses a stepwise procedure in which, at each step, the most powerful variable is entered into the discriminant function. The SDA technique is anchored in the F-test for the significance of variables and at each step selects a variable based on its significance, and, after several steps, the most significant variables are extracted from the set in question [20, 68].
22.214.171.124 Soft independent modeling of class analogy (SIMCA) technique
This SIMCA technique develops principal component models for each training set category. Its main objective is the reliable classification of new samples. When a prediction is made with the SIMCA technique, new samples insufficiently close to the PC space of a class are considered nonmembers. Furthermore, the technique requires that each training sample be pre-assigned to one of Q different categories, where Q is typically greater than one. It provides three possible outcome predictions: the sample fits only one pre-defined category, the sample does not fit any of the pre-defined categories, and the sample fits into more than one pre-defined category .
3.1.3 Computers, software, compounds, and molecular descriptors
For the present chapter, we performed molecular calculations on an AMD PHENOM 955 X4 2.2 GHz processor with 4 Gb of RAM with the Gaussian 98 program package . The MEP was computed from the electronic density, and the maps were displayed using the MOLEKEL software , while the PR models were carried out on a PC Pentium machine with the Pirouette program .
Figure 1 shows the 2D structure of the 5-nitrofuran-2-aldoxim molecule  used in the selection of method/basis set (see Section 126.96.36.199). In Figures 2 and 3 the 2D structures of the nitrofuran compounds from the training [73, 74, 75] and prediction sets are displayed, respectively. In this work, the nitrofuran molecules were defined as more active against T. cruzi, when in vitro growth rate inhibition (GR) T. cruzi ≥ 75, and as less active when in vitro growth rate inhibition T. cruzi < 75.
In general, the structure–activity relationship shows that for the compounds 1–6, the increase in the carbon chain improves the activity against T. cruzi. The comparison between compounds 3 and 2 evidences increased activity by the substitution of the N atom by O. We can also notice that increasing the number of unsaturations and returning the nitrogen to the chain will lead to a decrease in biological activity (7, 8). Still in relation to compound 1, increasing the unsaturations, returning the atom of O, and increasing the carbon chain length (9–12) substantially increase the activity against T. cruzi. On the other hand, in compounds 13 and 14, returning to an unsaturation in the main chain and introducing electron-withdrawing groups and more electronegative atoms, there is a decrease in chagasic activity. This evidence can also be verified for compounds 16, 17, 19–22.
The molecular descriptors were obtained for the most stable conformation of each compound. These descriptors were computed to give information about the influence of electronic, steric, hydrophilic, and hydrophobic features on the antitrypanosomal activity of the studied nitrofurans. The atomic charges in this work were derived from the electrostatic potential obtained with HF/6-31G method/basis set as implemented in the Gaussian program package. The electrostatic potential is obtained through the calculation of a set of punctual atomic charges so that it represents the possible best quantum molecular electrostatic potential for a set of points defined around the molecule [76, 77]. The charges derived from electrostatic potential present the advantage of being, in general, physically more satisfactory than the charges of Mülliken , especially with regard to biological activity.
The quantum–chemical descriptors employed and obtained with the Gaussian 98 program package  were total energy of molecules (TE), highest occupied molecular orbital (HOMO) energy, one level below to highest occupied molecular orbital (HOMO–1) energy; lowest unoccupied molecular orbital (LUMO) energy, one level about lowest unoccupied molecular orbital (LUMO+1) energy, HOMO energy–LUMO energy (gap energy), total dipole moment (μ), Mulliken’s electronegativity (χ), atomic charges on the Nth atom (QN), molecular hardness (HD), and molecular softness (MS).
The physicochemical descriptors obtained with ChemPlus module  were total surface area (TSA), molecular volume (VOL), molecular refractivity (MR), and molecule hydration energy (MHE).
Molecular holistic (MH) descriptors were included with the purpose of representing different sources of chemical information in terms of molecular size, symmetry, and distribution of atoms in molecules. Also, we include topologic indices, connectivity indices, geometric descriptors, 3D-MoRSE descriptors, and Moriguchi octanol–water partition coefficient (MlogP). These descriptors were calculated with the Dragon software .
188.8.131.52 Theoretical approach and basis set used in the molecular calculations
In the calculations with the nitrofuran compounds (Figure 1), quantum–chemical approaches were used [81, 82, 83, 84, 85, 86, 87]. We use Becke’s three-parameter hybrid methods , the Lee-Yang-Parr (LYP) correlation functional , B3LYP and Becke’s 1988 functional (BLYP) , Hartree-Fock (HF) method , Austin model 1 (AM1) method , Parametric Method Number 3 (PM3) , and standard basis sets  available in the Gaussian program package. In 5-nitrofuran-2-aldoxim, geometry optimization was carried out by B3LYP/6-21G, B3LYP/6-21G*, B3LYP/6-31G, B3LYP/6-31-G*, BLYP/6-21G, BLYP/6-21G*, BLYP/6-31G, BLYP/6-31G*, HF/6-21G, HF/6-21G*, HF/6-31G, and HF/6-31G* approaches [81, 82, 83, 84] and basis sets  and AM1 and PM3 approaches [85, 86] . The calculations were performed to find the approach and basis set that would present the best compromise between computational time and accuracy of the information relative to the experimental data. The experimental structure of 5-nitrofuran-2-aldoxim molecule was retrieved from the Cambridge Structural Database CSD . PCA and HCA techniques were used to compare the computed structures with different methods/basis sets of quantum chemistry with the experimental structure of 5-nitrofuran-2-aldoxim molecule to identify the appropriate method and the basis set for further calculations. The analyzes were carried out on an auto-scaled data matrix with dimension 26 × 5, where each row was associate 26 computed and 1 experimental geometry, and each column represented one of 5 geometrical parameters of the 5-nitrofuran-2-aldoxim molecule (bond lengths and bond angles). In order to compute all structures and perform calculations to obtain the molecular properties, the HF/6-31G method has selected (see Results and discussion section); the initial geometries of the nitrofurans (Figures 2 and 3) were built with the optimized geometry of the 5-nitrofuran-2-aldoxim molecule selected by PCA and HCA techniques. A conformational analysis for each compound was carried out with the MM+ algorithm , and the lowest energy conformation was submitted to a conformational search with the Gaussian program.
3.2 Results and discussion
3.2.1 Quantum–chemical approach and basis set selection for the description of the geometries of nitrofurans
The advantage in using the PCA and HCA techniques in this step was that all structural information are considered simultaneously and it takes into account the correlations among them. Table 1 shows the theoretical and experimental structural information (bond lengths and bond angles) of the geometry of the 5-nitrofuran-2-aldoxim molecule. It was used with the aim to select using PCA and HCA techniques, which quantum–chemical approach and basis set give results closest to the experimental data .
|Geometric parameters||B3LYP/6-21G||B3LYP/6-21G*||B3LYP/6-31G||B3LYP/6-31G*||BLYP/6-21G||BLYP/6-21G*||BLYP/6-31G||BLYP/6-31G*||HF/6-21G||HF/6-21G*||HF/6-31G||HF/6-31G*||AM1||PM3||Exp |
|Bond length (Å)|
|Bond angle (°)|
The first two principal components explain 86.02% of the original information as follows: PC1 = 58.01% and PC2 = 28.02%. The PC1 versus PC2 scores plot is shown in Figure 4, from which it can be seen that the methods are discriminated into two classes according to PC2. The semiempirical approaches (AM1 and PM3) are at the top of the graph, while the other theoretical (HF, BLYP, and B3LYP) approaches and experimental data are at the bottom. Moreover, it can be seen that the HF/6-31G approach/basis set is the closest to the experimental data, indicating that they should be used in the development of this work.
Also, to investigate the most appropriate approach and basis set for further calculations, we used HCA. Figure 5 shows the dendrogram obtained with complete linkage method; from this figure, we conclude that the theoretical approaches are distributed in a similar way as in PCA, i.e., HCA confirmed the PCA results. Moreover, we can observe that the HF/6-31G approach/basis set is closer to the experimental data therefore being the most suitable to carry out this work.
3.2.2 MEP maps for compounds of the training set
Figure 6 shows the MEP maps for the nitrofurans in the training set. The analysis of these maps reveals that the most active compounds, in general, have the following characteristics:
(i) Compounds with an unsaturation and presenting O atom neighboring the carbonyl in the carbonic chain present greater electron density in the proximities of the furan ring with the decrease of the chain size. In these compounds (4, 5, and 6), MEP maps show negative regions ranging from −82.99 to −4.87 kcal/mol. In the most active compound (6), as can be seen, the most negative values are in the nitro group, the O atom of the furan ring and the O atoms of the ester group (red and yellow). Also, the MEP maps of these compounds exhibit positive regions between the +4.54 and + 76.96 kcal/mol values (green and blue). Compounds with double unsaturation, containing N atom next to the carbonyl, raise the electronic density with the increase of the carbonic chain. In the most active compound (7), the MEP map shows a region of negative values between −77.74 and − 1.31 kcal/mol, with the electron density concentrating mainly on the atoms of the nitro group, on the O atom of the furanic ring and on the N and O atoms of the amide group (red and yellow). According to the MEP map, these compounds present positive MEP between +5.64 and 61.21 kcal/mol (green and blue).
(ii) Compounds with double unsaturation, containing O atom neighboring the carbonyl, raising the carbon chain, increase the electron density in the atoms of the nitro group, extending through the O atom of the furan ring to the O atoms of the ester group following the unsaturated chain. In these compounds (10–12), the MEP maps exhibit more negative values between −76.18 and − 6.36 kcal/mol (red and yellow). They exhibit positive MEP in the range of +0.63 to 67.42 kcal/mol (green and blue)
(iii) Compound with an unsaturation, N atom neighboring the carbonyl in the carbonic chain and bulky substituents, has higher electron density in the vicinity of the furan ring and in the N and O atoms of the amide group. In this compound (23), the MEP map shows a negative region (red and yellow) between −73.10 and − 1.59 kcal/mol on the mentioned atoms and positive region between +5.56 and 69.91 kcal/mol (green and blue). The electron density around the nitro group, the O atom of the furan ring, and other atoms may induce the nitrofurans to show antitrypanosomal activity, suggesting the complexation in those regions with the active site of the receptor in a biological recognition process.
From the above discussion, as a rule, to plan more active nitrofurans, we can assume we resort to one of the basic structures of the most active compounds and introduce groups of atoms or substituents electron donors enhancing the key structural features that are necessary for their activities.
3.2.3 Chemometric modeling
To perform the chemometric modeling, all variables were auto-scaled as pre-processing so that they could be standardized and so they could have the same importance regarding the scale. Furthermore, given a large quantity of multivariate data available, it was necessary to reduce the number of variables. Thus if any two descriptors had a high Pearson correlation coefficient (r ˃ 0.8), one of the two was excluded from the matrix at random, since theoretically they describe the same property ; they also have a high correlation with antitrypanosomal activity, and only one of them is enough to be used as independent variable in a predictive model.
184.108.40.206 PCA model
Four molecular descriptors were selected for PCA model. The molecular descriptors (QN1, gap energy, Mor05u, and MlogP), in vitro T. cruzi growth inhibition (experimental data), and activity and correlation matrix including all data for 23 nitrofurans can be seen in Table 2. The correlation between descriptors is less than 0.786. The first three principal components (PCs) describing 96.48 of the original information for the 23 are as follows: 45.70, 30.91, and 19.87%. PC1-PC2 scores for the samples are shown in Figure 7. From this figure, we can see that the nitrofurans are distributed into two distinct regions in PC1. The more active compounds are on the left side (4–7, 10–12, 18, and 23) and the less active on the right side (1–3, 8, 9, 13–17, and 19–22). According to Figure 8, the MlogP descriptor is responsible for displaying more active compounds on the left side, while the gap energy, QN1, and Mor05u descriptors displayed fewer active compounds for the right side from this figure.
|Nitrofurans||QN1||Gap energy (kcal/mol)||Mor05u||MlogP||% in vitro T. cruzi growth inhibitiona,b||Activityc|
Table 3 shows the loading vectors for PC1, PC2, and PC3. According to this table, PC1 can be expressed through the following equation:
From this equation, more active nitrofurans, in general, can be obtained when we have lower values for the QN1 combined with lower values for Gap energy and Mor05u and higher values for MlogP.
220.127.116.11 HCA model
The results of the HCA model are displayed in the dendrogram in Figure 9 and are similar to those of PCA model. The nitrofurans are fairly well grouped according to their activity. From this figure, the two clusters (+ and −) mirror the same two classes displayed by PCA model (Figure 7).
18.104.22.168 KNN model
Table 4 shows the results for the KNN models obtained with the KNN technique and constructed with one (1NN) to four (4NN) nearest neighbors. To all models the percentage of correct information was 100%. We used the model 4NN because the greater the number of the nearest neighbors, the better the reliability of the KNN technique, and the same was used for validation of the training set from Figure 2.
|Category||Number of compounds||Compounds incorrectly classified|
|Class: less active||14||0||0||0||0|
|% Correct information||100||100||100||100|
22.214.171.124 SDA model
In the construction of the SDA model, the discrimination functions for groups more active and less active, respectively, are given below:
Group MA (more active):
Group LA (less active):
Also, through the discrimination functions, Eqs. (3) and (4), and of the value of each descriptor for the nitrofurans, we obtain the classification matrix by using all compounds from the training set (Table 5). The classification error was 0.00% resulting in a satisfactory separation of more active and less active compounds. From SDA model, the allocation rule was derived when the activity against T. cruzi of new nitrofurans is investigated: (a) initially calculate, for the new compound, the value of the most important descriptors obtained in the construction of the SDA model, (b) put these auto-scaled values in the two discrimination functions performed in this work, and (c) check which discrimination function, Eq. (3) or Eq. (4), presents higher value. The new compound is more active if it is related to discrimination function of group more active and vice versa.
|Classification group or class||Number of compounds||More active||Less active|
|Group (Class): more active||9||9||0|
|Group (Class): less active||14||0||14|
|% Correct information||—||100||100|
In order to check the reliability of the model, the “leave-one-out technique” was employed. One nitrofuran compound is excluded from the data set, and the remaining compounds are used in building the classification functions.
Subsequently, the removed analogue is classified according the generated classification functions. In the further step, the omitted compound is included, and a new nitrofuran is removed, and the procedure goes on until the last compound is removed. In Table 6 the results obtained with the cross-validation model are summarized.
|Classification group or class||Number of compounds||More active||Less active|
|Group (class): more active||9||9||0|
|Group (class): less active||14||0||14|
|% correct information||—||100||100|
126.96.36.199 SIMCA model
The SIMCA model were built with the same descriptors as PCA, HCA, KNN, and SDA models and used two (2) PCs in the modeling of the two classes: more active nitrofurans (4–7, 10–12, 18, and 23) and less active (1–3, 8, 9, 13–17, and 19–22) nitrofurans. In Table 7, the obtained results for the SIMCA model are shown. In this case, the information percentage was also 100%. According to the PCA, HCA, KNN, SDA, and SIMCA models, we can also notice that the QN1, gap energy, Mor05u, and MlogP descriptors are key properties for explaining the anti-T. cruzi activity of the nitrofurans training set (Figure 2).
|Category||Number of compounds||Correct classification|
|Class: more active||9||9|
|Class: less active||14||14|
|% correct information||100|
As QN1, gap energy, Mor05u, and MlogP properties were selected in the chemometric modeling as the most important characteristics to describe the antitrypanosomal activity, some considerations about them may be relevant to the understanding of the behavior of more active nitrofurans. According to classical chemical theory, chemical interactions can be classified in two categories: electrostatic (polar) or orbital (covalent). Electrical charges in the molecule are indubitably the impelling cause of electrostatic interactions. It has been demonstrated that local electron densities or charges are important in many chemical reactions, physicochemical properties, and ligand–receptor interactions [89, 90]. Thus, charge-based parameters have been widely employed as chemical reactivity indices or as measures of weak intermolecular interactions. Many quantum–chemical descriptors are derived from the partial charge distribution in a molecule or from the electron densities on particular atoms . From Table 2, we can observe that, in general, QN1 for more active analogues must present lower values than the less active ones. This is an indication that biological processes can occur through electrostatic interactions between the more active nitrofurans and an eventual biological receptor.
Gap energy is an important stability index. A large gap energy implies high stability for the molecule in the sense of its lower reactivity in chemical reactions. It is an approximation of the lowest excitation energy of the molecule and can be used for the definition of absolute and activation hardness [89, 90]. In Table 2, we can observe that, in general, the more active nitrofurans present lower gap energy than the less active ones. This indicates that the more active nitrofurans have a great probability of interacting with the biological receptor through a charge transfer mechanism.
Mor05u is a 3D-MoRSE descriptor based on the idea of obtaining information from 3D atomic coordinates through the transformed used in electrons diffraction studies  and is strictly related to the stereochemistry of the compounds . According to Table 2, the more active nitrofurans present lower values of Mor5u. This may be, in general, an indication of the importance of the stereochemical properties of the more active nitrofurans in a possible mechanism of action of its own.
MlogP is an important hydrophobic descriptor in diverse biochemical, pharmacological, and toxicological processes involved in drug absorption . As identified in Table 2, the more active reported nitrofurans exhibit the higher MlogP values. This is an indication that in processes involving nitrofurans and a biological receptor, hydrophobic interactions may be important in the mechanism of action of these compounds.
Knowing the performance of the RP models constructed for the 23 studied nitrofurans, we decided to apply them to a series of eight compounds (Figure 3) designed to maintain the key structural features that are necessary for their biological activities evidenced by the MEP maps of the compounds of the training set. The basic nucleus of these compounds corresponds to that of the most active nitrofurans with double unsaturation, containing vicinal O atom to carbonyl (see compounds 10–12). The eight molecules proposed for the study of prediction of activity were drawn with the help of one of the collaborators of this work, who belong to the research group in organic chemistry of the Federal University of Pará, Brazil, and the most promising syntheses are in progress. In the future, antitrypanosomal tests with the most promising nitrofurans can be used to validate our RP models.
The results obtained of the application of the PR models (PCA, HCA, KNN, SDA, and SIMCA) and the descriptors for the compounds of the prediction set are summarized in Tables 8 and 9, respectively. In Table 8, the compounds 25 and 30 were predicted as more active against T. cruzi with the five models. Only the KNN model predicted compound 26 as the most active. Meanwhile, only the PCA and HCA models predicted compound 31 as the most active. On the other hand, all models, except the SDA model, predicted compounds 24, 27, and 28 as the most active. In turn, the SIMCA model did not classify compounds 29 and 31 into any of the two classes. Thus, we can consider nitrofurans 25 and 30 as potentially more active in a future test against T. cruzi. For the values reported for compounds 25 and 30 (Table 9), it can be shown that in order to design more active nitrofurans we must combine smaller values for the descriptors QN1, gap energy, and Mor05u with higher value for the descriptor MlogP.
|Nitrofuran||PCA model||HCA model||KNN model||SDA model||SIMCA model|
|Nitrofuran||QN1||Gap energy (kcal/mol)||Mor05u||MLogP|
3.2.4 MEP maps for compounds of the prediction set
Figure 10 shows the MEP maps for the most active nitrofurans in the validation set (25 and 30). Also, in these compounds, as can be seen, raising the carbon chain increases the electron density in the atoms of the nitro group, extending through the O of the furan ring to the O atoms of the ester group accompanying the unsaturated chain. In these compounds, the MEP maps show more negative values between −74.27 and − 1.76 kcal/mol (red and yellow). They exhibit positive MEP in the range + 4.84 to +57.58 kcal/mol (green and blue).
The negative MEP region of compounds 25 and 30, similar to the more active compounds in the training set, is susceptible to attack in a biological recognition process.
3.3 Concluding remarks
MEP and chemometric techniques in the last decades have become efficient tools in the study of the structure–activity relationships of bioactive molecules. The use of such tools has occurred through the inherent principles of each or combining their potentials to more efficiently unravel information about the structure–activity relationships of pharmacologically interesting compounds. This chapter is circumscribed in this second possibility. MEP maps were constructed for 23 nitrofurans with activity against T. cruzi reported in the literature. The key structural features required for antitrypanosomal activity, along with chemical intuition, allowed the introduction of substituents in one of the most active nitrofurans in the training set to obtain eight new derivatives.
PR models (PCA, HCA, KNN, SDA, and SIMCA) were constructed and demonstrated that 23 nitrofurans can be classified into two classes or groups: more active and less active according to their degrees of activity against T. cruzi. The properties QN1, gap energy, Mor05u, and MlogP are responsible for the classification into more active and less active studied nitrofurans. It is interesting to notice that these properties represent three distinct classes of interactions between the nitrofurans and the biological receptor: electronic (QN1 and gap energy), steric (Mor05u), and hydrophobic (MlogP). Here it is important to mention that Paulino et al., studying the influence of molecular parameters on the activity of 5-nitrofurans against T. cruzi, reported the importance of electronic properties and molecular hydrophobicity as well as the variation of the nitrofurans electronic structure to explain the greater activity of these compounds as inhibitors of the growth of this protozoan .
The results of the application of PR models on the validation set evidenced two nitrofurans (25 and 30) as more promising for synthesis and biological assays, which in the future can be used to validate our PR models.
We gratefully acknowledge the financial support of the Brazilian agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. The authors would like to thank the Virtual Computational Chemistry Laboratory (VCCLAB–Munich) and the Swiss Center for Scientific Computing for the use of the DRAGON and MOLEKEL software, respectively. We employed computing facilities at the Laboratório de Química Teórica e Computacional (LQTC)–Universidade Federal do Pará.