Experimental and theoretical structural parameter of the 5-nirofuran-2-aldoxime.
In this chapter, firstly, we briefly review aspects of the approximation of quantum chemistry, molecular electrostatic potential (MEP), and chemometrics techniques, which are accredited as important tools in the development of chemical science and are frequently used in the study and design of bioactive compounds. Ultimately, we use MEP and pattern recognition (PR) techniques as tools to design nitrofuran compounds with biological activity against Trypanosoma cruzi (T. cruzi). PR models (PCA, HCA, KNN, SDA, and SIMCA) were constructed and demonstrated that 23 nitrofurans can be classified into two classes or groups: more active and less active according to their degrees of activity against T. cruzi. Properties such as charge on the N atom of the nitro group (QN1); the difference between the highest occupied molecular orbital (HOMO) energy and the lowest unoccupied molecular orbital (LUMO) energy (GAP energy); molecular representation of structure based on electron diffraction code of signal 5, unweighted (Mor05u); and Moriguchi water–octanol partition coefficient (MlogP) are responsible for the classification into more active and less active studied nitrofurans. It is interesting to notice that these properties represent three distinct classes of interactions between the nitrofurans and the biological receptor: electronic (QN1 and GAP energy), steric (Mor05u), and hydrophobic (MlogP). The results of the application of PR models on the validation set evidenced two nitrofuran compounds (compounds 25 and 30) as more promising for synthesis and biological assays, which in the future can be used to validate our PR models.
- molecular electrostatic potential
- chemometric techniques
- pattern recognition techniques
- design of bioactive compounds
Reports of theoretical bases of MEP and the development of efficient computational methods state that MEP has become an important reactivity index in studies of a large variety of molecular interactions . The usefulness of this theoretical approach in studies and interpretation of chemical, biochemical, and related phenomena is well documented [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18].
Chemometrics is a discipline that collects mathematical, statistical, information theory, and computer science tools to deal with complex chemical data [19, 20, 21, 22]. PR techniques were introduced in the chemistry, at the beginning of the 1970s, to analyze various types of spectroscopic data. Since then, PR became part of chemometrics and has been an excellent tool to aid in the interpretation of chemical data to obtain relevant information in different application sectors of chemical science [19, 20]. PR techniques are especially useful for the classification of objects into discrete classes on the basis of measured features. A set of characteristic features of an object is considered as an abstract pattern that contains information about a not directly measured property of the object .
The MEP and PR techniques have been used as independent strategies in the study of active compounds and lead to the proposal of new molecules for synthesis and biological testing. The joint applications of these powerful tools were described carefully, to unravel the structure-activity relationship of bioactive compounds, consequently proposing new molecules. Therefore, a more intense exploration of its potentials is needed in order to design biologically active compounds.
The design of molecules with a desired property is one of the objectives of chemoinformatics. In this chapter, we present a study of the application of MEP and PR techniques to design nitrofuran compounds with potential activity against
2. MEP and chemometrics techniques as tools for the design of bioactive compounds: a brief review
According to the literature, MEP [1, 3] has been a tool of quantum chemistry used by researchers for several decades to study and understand the relationships between structure and activity of molecules. Among the papers that point out the importance of this tool in the matter, and consequently in the planning of bioactive compounds, we can mention those reported by Bernardinelli et al.  and by Jefford et al. .
Another tool, in the form of a set of techniques has been used emphatically over the years in the understanding of the structure-activity relationship of molecules is Chemometrics [25, 26, 27]. This set of techniques has also enables the planning of new biologically active compounds, and most of the developed research is focused on the construction of QSAR (quantitative structure-activity relationship) models.
The combination of MEP and chemometrics as tools for designing new bioactive compounds has almost always been focused on the elaboration of quantitative models, for example, the CoMFA methodology . This methodology was developed in the late 1980s by Cramer et al. . Its application is richly extensive and recently it has been used in several studies of structure–activity relationships of bioactive compounds. Chatbar et al. conducted a study of triazine morpholino derivatives as mTOR inhibitors for the treatment of breast cancer . Pourbasheer et al. performed 3D-QSAR and 2D-QSAR analyses on the series of compounds hepatitis C virus NS5B polymerase inhibitors . Cramer applied the CoMFA methodology for a large majority of 116 biological targets and obtained acceptable 3D-QSAR models . Cramer et al. introduced in the literature a novel alignment methodology for training or test set structures in 3D-QSAR . Dong et al. performed QSAR analyses of aromatic heterocycle thiosemicarbazone analogues for finding novel tyrosinase inhibitors . Dong et al. built 3D-QSAR models of dabigatran analogues as thrombin inhibitors . Ding et al. performed 3D-QSAR models of 6-aryl-5-cyano-pyrimidine derivatives to explore the structure requirements of LSD1 inhibitors .
Applications of MEP to investigate the key features of compounds that are necessary for their biological activities and thus proposing new derivatives as well as the construction of chemometric models as indicative of the most promising among the new derivatives for syntheses and biological assays were reported by us in literature [37, 38, 39, 40, 41, 42, 43]. Pinheiro et al. stated the use of MEP and partial least squares regression (PLS) method in the design of new artemisinin derivatives with activities against
3. MEP and PR techniques as tools to design nitrofuran compounds with biological activity against
3.1.1 Biological recognition process ligand/receptor through the molecular electrostatic potential
The MEP is also suitable for analyzing processes based on the “recognition” of one molecule by another as in drug-receptor and enzyme-substrate interactions, because it is through their potentials that the two species first “see” each other [2, 3, 44, 45, 46].
MEP for the electronic density is a very useful property for understanding the site of electrophilic attack and nucleophilic reactions as well as the hydrogen bonding interactions . The MEP at a given point (x, y, z) in the vicinity of a molecule is defined in terms of the interaction energy between the electrical charge generated from the molecule’s electrons and nuclei and a positive charge test (a proton) located at . Being a real physical property, MEP can be determined experimentally by diffraction or by computational tools . For the studied nitrofuran molecules, the MEP values were computed through Eq. (1) 
where K is the number of nuclei with charges
3.1.2 RP techniques
In this section, we will make a brief presentation of the PR techniques used in this chapter. A deeper and detailed description of these matters can be found elsewhere [47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66].
18.104.22.168 Principal component analysis (PCA) technique
When computing large multivariate data, it is mandatory to find and reduce unknown data trends using exploratory tools. The main idea of the PCA technique is to reduce the dimensionality of a data set consisting of large numbers of interrelated variables while retaining the variation present in the data set as much as possible. This can be achieved by transforming them into a new set of variables, the PCs, which are uncorrelated and ordered so that the first few retain most of the variation present in all of the original variables. As the final result, the PCA technique performs the selection of a small number of variables (molecular properties) considered better related to the dependent property or feature , in this study, the biological activity against
22.214.171.124 Hierarchical cluster analysis (HCA) technique
This technique has become, together with PCA, another important tool in pattern recognition . The purpose of using it is to display the data in such a way as to emphasize its natural clusters and patterns in a two-dimensional space. The results are presented as dendrograms. In HCA technique, the distances between objects or variables are calculated and computed through the similarity index which ranges from zero, that is, no similarity and large distance among objects, to one, for identical objects.
126.96.36.199 K-nearest neighbor (KNN) technique
The KNN technique  classifies the objects based on distance comparison among them. The multivariate Euclidean distances between every pair of objects with known class membership are calculated. The closest K objects are used to build the model. The optimal K is determined by cross-validation applied to the training set objects. The classification of a test object is determined based on the multivariate distance of this object with respect to the K objects in the training set. In this technique no assumption is made about the size and shape of the training set classes.
188.8.131.52 Stepwise discriminant analysis (SDA) technique
This technique separates objects from distinct populations and allocates new objects into populations previously defined. It uses a stepwise procedure in which, at each step, the most powerful variable is entered into the discriminant function. The SDA technique is anchored in the F-test for the significance of variables and at each step selects a variable based on its significance, and, after several steps, the most significant variables are extracted from the set in question [20, 68].
184.108.40.206 Soft independent modeling of class analogy (SIMCA) technique
This SIMCA technique develops principal component models for each training set category. Its main objective is the reliable classification of new samples. When a prediction is made with the SIMCA technique, new samples insufficiently close to the PC space of a class are considered nonmembers. Furthermore, the technique requires that each training sample be pre-assigned to one of
3.1.3 Computers, software, compounds, and molecular descriptors
For the present chapter, we performed molecular calculations on an AMD PHENOM 955 X4 2.2 GHz processor with 4 Gb of RAM with the Gaussian 98 program package . The MEP was computed from the electronic density, and the maps were displayed using the MOLEKEL software , while the PR models were carried out on a PC Pentium machine with the Pirouette program .
Figure 1 shows the 2D structure of the 5-nitrofuran-2-aldoxim molecule  used in the selection of method/basis set (see Section 220.127.116.11). In Figures 2 and 3 the 2D structures of the nitrofuran compounds from the training [73, 74, 75] and prediction sets are displayed, respectively. In this work, the nitrofuran molecules were defined as more active against
In general, the structure–activity relationship shows that for the compounds
The molecular descriptors were obtained for the most stable conformation of each compound. These descriptors were computed to give information about the influence of electronic, steric, hydrophilic, and hydrophobic features on the antitrypanosomal activity of the studied nitrofurans. The atomic charges in this work were derived from the electrostatic potential obtained with HF/6-31G method/basis set as implemented in the Gaussian program package. The electrostatic potential is obtained through the calculation of a set of punctual atomic charges so that it represents the possible best quantum molecular electrostatic potential for a set of points defined around the molecule [76, 77]. The charges derived from electrostatic potential present the advantage of being, in general, physically more satisfactory than the charges of Mülliken , especially with regard to biological activity.
The quantum–chemical descriptors employed and obtained with the Gaussian 98 program package  were total energy of molecules (TE), highest occupied molecular orbital (HOMO) energy, one level below to highest occupied molecular orbital (HOMO–1) energy; lowest unoccupied molecular orbital (LUMO) energy, one level about lowest unoccupied molecular orbital (LUMO+1) energy, HOMO energy–LUMO energy (gap energy), total dipole moment (μ), Mulliken’s electronegativity (χ), atomic charges on the Nth atom (QN), molecular hardness (HD), and molecular softness (MS).
The physicochemical descriptors obtained with ChemPlus module  were total surface area (TSA), molecular volume (VOL), molecular refractivity (MR), and molecule hydration energy (MHE).
Molecular holistic (MH) descriptors were included with the purpose of representing different sources of chemical information in terms of molecular size, symmetry, and distribution of atoms in molecules. Also, we include topologic indices, connectivity indices, geometric descriptors, 3D-MoRSE descriptors, and Moriguchi octanol–water partition coefficient (MlogP). These descriptors were calculated with the Dragon software .
18.104.22.168 Theoretical approach and basis set used in the molecular calculations
In the calculations with the nitrofuran compounds (Figure 1), quantum–chemical approaches were used [81, 82, 83, 84, 85, 86, 87]. We use Becke’s three-parameter hybrid methods , the Lee-Yang-Parr (LYP) correlation functional , B3LYP and Becke’s 1988 functional (BLYP) , Hartree-Fock (HF) method , Austin model 1 (AM1) method , Parametric Method Number 3 (PM3) , and standard basis sets  available in the Gaussian program package. In 5-nitrofuran-2-aldoxim, geometry optimization was carried out by B3LYP/6-21G, B3LYP/6-21G*, B3LYP/6-31G, B3LYP/6-31-G*, BLYP/6-21G, BLYP/6-21G*, BLYP/6-31G, BLYP/6-31G*, HF/6-21G, HF/6-21G*, HF/6-31G, and HF/6-31G* approaches [81, 82, 83, 84] and basis sets  and AM1 and PM3 approaches [85, 86] . The calculations were performed to find the approach and basis set that would present the best compromise between computational time and accuracy of the information relative to the experimental data. The experimental structure of 5-nitrofuran-2-aldoxim molecule was retrieved from the Cambridge Structural Database CSD . PCA and HCA techniques were used to compare the computed structures with different methods/basis sets of quantum chemistry with the experimental structure of 5-nitrofuran-2-aldoxim molecule to identify the appropriate method and the basis set for further calculations. The analyzes were carried out on an auto-scaled data matrix with dimension 26 × 5, where each row was associate 26 computed and 1 experimental geometry, and each column represented one of 5 geometrical parameters of the 5-nitrofuran-2-aldoxim molecule (bond lengths and bond angles). In order to compute all structures and perform calculations to obtain the molecular properties, the HF/6-31G method has selected (see Results and discussion section); the initial geometries of the nitrofurans (Figures 2 and 3) were built with the optimized geometry of the 5-nitrofuran-2-aldoxim molecule selected by PCA and HCA techniques. A conformational analysis for each compound was carried out with the MM+ algorithm , and the lowest energy conformation was submitted to a conformational search with the Gaussian program.
3.2 Results and discussion
3.2.1 Quantum–chemical approach and basis set selection for the description of the geometries of nitrofurans
The advantage in using the PCA and HCA techniques in this step was that all structural information are considered simultaneously and it takes into account the correlations among them. Table 1 shows the theoretical and experimental structural information (bond lengths and bond angles) of the geometry of the 5-nitrofuran-2-aldoxim molecule. It was used with the aim to select using PCA and HCA techniques, which quantum–chemical approach and basis set give results closest to the experimental data .
|Geometric parameters||B3LYP/6-21G||B3LYP/6-21G*||B3LYP/6-31G||B3LYP/6-31G*||BLYP/6-21G||BLYP/6-21G*||BLYP/6-31G||BLYP/6-31G*||HF/6-21G||HF/6-21G*||HF/6-31G||HF/6-31G*||AM1||PM3||Exp |
|Bond length (Å)|
|Bond angle (°)|
The first two principal components explain 86.02% of the original information as follows: PC1 = 58.01% and PC2 = 28.02%. The PC1 versus PC2 scores plot is shown in Figure 4, from which it can be seen that the methods are discriminated into two classes according to PC2. The semiempirical approaches (AM1 and PM3) are at the top of the graph, while the other theoretical (HF, BLYP, and B3LYP) approaches and experimental data are at the bottom. Moreover, it can be seen that the HF/6-31G approach/basis set is the closest to the experimental data, indicating that they should be used in the development of this work.
Also, to investigate the most appropriate approach and basis set for further calculations, we used HCA. Figure 5 shows the dendrogram obtained with complete linkage method; from this figure, we conclude that the theoretical approaches are distributed in a similar way as in PCA, i.e., HCA confirmed the PCA results. Moreover, we can observe that the HF/6-31G approach/basis set is closer to the experimental data therefore being the most suitable to carry out this work.
3.2.2 MEP maps for compounds of the training set
Figure 6 shows the MEP maps for the nitrofurans in the training set. The analysis of these maps reveals that the most active compounds, in general, have the following characteristics:
(i) Compounds with an unsaturation and presenting O atom neighboring the carbonyl in the carbonic chain present greater electron density in the proximities of the furan ring with the decrease of the chain size. In these compounds (
(ii) Compounds with double unsaturation, containing O atom neighboring the carbonyl, raising the carbon chain, increase the electron density in the atoms of the nitro group, extending through the O atom of the furan ring to the O atoms of the ester group following the unsaturated chain. In these compounds (
(iii) Compound with an unsaturation, N atom neighboring the carbonyl in the carbonic chain and bulky substituents, has higher electron density in the vicinity of the furan ring and in the N and O atoms of the amide group. In this compound (
From the above discussion, as a rule, to plan more active nitrofurans, we can assume we resort to one of the basic structures of the most active compounds and introduce groups of atoms or substituents electron donors enhancing the key structural features that are necessary for their activities.
3.2.3 Chemometric modeling
To perform the chemometric modeling, all variables were auto-scaled as pre-processing so that they could be standardized and so they could have the same importance regarding the scale. Furthermore, given a large quantity of multivariate data available, it was necessary to reduce the number of variables. Thus if any two descriptors had a high Pearson correlation coefficient (r ˃ 0.8), one of the two was excluded from the matrix at random, since theoretically they describe the same property ; they also have a high correlation with antitrypanosomal activity, and only one of them is enough to be used as independent variable in a predictive model.
22.214.171.124 PCA model
Four molecular descriptors were selected for PCA model. The molecular descriptors (QN1, gap energy, Mor05u, and MlogP), in vitro
|Nitrofurans||QN1||Gap energy (kcal/mol)||Mor05u||MlogP||% in vitro ||Activityc|
Table 3 shows the loading vectors for PC1, PC2, and PC3. According to this table, PC1 can be expressed through the following equation:
From this equation, more active nitrofurans, in general, can be obtained when we have lower values for the QN1 combined with lower values for Gap energy and Mor05u and higher values for MlogP.
126.96.36.199 HCA model
The results of the HCA model are displayed in the dendrogram in Figure 9 and are similar to those of PCA model. The nitrofurans are fairly well grouped according to their activity. From this figure, the two clusters (+ and −) mirror the same two classes displayed by PCA model (Figure 7).
188.8.131.52 KNN model
Table 4 shows the results for the KNN models obtained with the KNN technique and constructed with one (1NN) to four (4NN) nearest neighbors. To all models the percentage of correct information was 100%. We used the model 4NN because the greater the number of the nearest neighbors, the better the reliability of the KNN technique, and the same was used for validation of the training set from Figure 2.
|Category||Number of compounds||Compounds incorrectly classified|
|Class: less active||14||0||0||0||0|
|% Correct information||100||100||100||100|
184.108.40.206 SDA model
In the construction of the SDA model, the discrimination functions for groups more active and less active, respectively, are given below:
Group MA (more active):
Group LA (less active):
Also, through the discrimination functions, Eqs. (3) and (4), and of the value of each descriptor for the nitrofurans, we obtain the classification matrix by using all compounds from the training set (Table 5). The classification error was 0.00% resulting in a satisfactory separation of more active and less active compounds. From SDA model, the allocation rule was derived when the activity against
|Classification group or class||Number of compounds||More active||Less active|
|Group (Class): more active||9||9||0|
|Group (Class): less active||14||0||14|
|% Correct information||—||100||100|
In order to check the reliability of the model, the “leave-one-out technique” was employed. One nitrofuran compound is excluded from the data set, and the remaining compounds are used in building the classification functions.
Subsequently, the removed analogue is classified according the generated classification functions. In the further step, the omitted compound is included, and a new nitrofuran is removed, and the procedure goes on until the last compound is removed. In Table 6 the results obtained with the cross-validation model are summarized.
|Classification group or class||Number of compounds||More active||Less active|
|Group (class): more active||9||9||0|
|Group (class): less active||14||0||14|
|% correct information||—||100||100|
220.127.116.11 SIMCA model
The SIMCA model were built with the same descriptors as PCA, HCA, KNN, and SDA models and used two (2) PCs in the modeling of the two classes: more active nitrofurans (
|Category||Number of compounds||Correct classification|
|Class: more active||9||9|
|Class: less active||14||14|
|% correct information||100|
As QN1, gap energy, Mor05u, and MlogP properties were selected in the chemometric modeling as the most important characteristics to describe the antitrypanosomal activity, some considerations about them may be relevant to the understanding of the behavior of more active nitrofurans. According to classical chemical theory, chemical interactions can be classified in two categories: electrostatic (polar) or orbital (covalent). Electrical charges in the molecule are indubitably the impelling cause of electrostatic interactions. It has been demonstrated that local electron densities or charges are important in many chemical reactions, physicochemical properties, and ligand–receptor interactions [89, 90]. Thus, charge-based parameters have been widely employed as chemical reactivity indices or as measures of weak intermolecular interactions. Many quantum–chemical descriptors are derived from the partial charge distribution in a molecule or from the electron densities on particular atoms . From Table 2, we can observe that, in general, QN1 for more active analogues must present lower values than the less active ones. This is an indication that biological processes can occur through electrostatic interactions between the more active nitrofurans and an eventual biological receptor.
Gap energy is an important stability index. A large gap energy implies high stability for the molecule in the sense of its lower reactivity in chemical reactions. It is an approximation of the lowest excitation energy of the molecule and can be used for the definition of absolute and activation hardness [89, 90]. In Table 2, we can observe that, in general, the more active nitrofurans present lower gap energy than the less active ones. This indicates that the more active nitrofurans have a great probability of interacting with the biological receptor through a charge transfer mechanism.
Mor05u is a 3D-MoRSE descriptor based on the idea of obtaining information from 3D atomic coordinates through the transformed used in electrons diffraction studies  and is strictly related to the stereochemistry of the compounds . According to Table 2, the more active nitrofurans present lower values of Mor5u. This may be, in general, an indication of the importance of the stereochemical properties of the more active nitrofurans in a possible mechanism of action of its own.
MlogP is an important hydrophobic descriptor in diverse biochemical, pharmacological, and toxicological processes involved in drug absorption . As identified in Table 2, the more active reported nitrofurans exhibit the higher MlogP values. This is an indication that in processes involving nitrofurans and a biological receptor, hydrophobic interactions may be important in the mechanism of action of these compounds.
Knowing the performance of the RP models constructed for the 23 studied nitrofurans, we decided to apply them to a series of eight compounds (Figure 3) designed to maintain the key structural features that are necessary for their biological activities evidenced by the MEP maps of the compounds of the training set. The basic nucleus of these compounds corresponds to that of the most active nitrofurans with double unsaturation, containing vicinal O atom to carbonyl (see compounds
The results obtained of the application of the PR models (PCA, HCA, KNN, SDA, and SIMCA) and the descriptors for the compounds of the prediction set are summarized in Tables 8 and 9, respectively. In Table 8, the compounds
|Nitrofuran||PCA model||HCA model||KNN model||SDA model||SIMCA model|
|Nitrofuran||QN1||Gap energy (kcal/mol)||Mor05u||MLogP|
3.2.4 MEP maps for compounds of the prediction set
Figure 10 shows the MEP maps for the most active nitrofurans in the validation set (
The negative MEP region of compounds
3.3 Concluding remarks
MEP and chemometric techniques in the last decades have become efficient tools in the study of the structure–activity relationships of bioactive molecules. The use of such tools has occurred through the inherent principles of each or combining their potentials to more efficiently unravel information about the structure–activity relationships of pharmacologically interesting compounds. This chapter is circumscribed in this second possibility. MEP maps were constructed for 23 nitrofurans with activity against
PR models (PCA, HCA, KNN, SDA, and SIMCA) were constructed and demonstrated that 23 nitrofurans can be classified into two classes or groups: more active and less active according to their degrees of activity against
The results of the application of PR models on the validation set evidenced two nitrofurans (
We gratefully acknowledge the financial support of the Brazilian agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. The authors would like to thank the Virtual Computational Chemistry Laboratory (VCCLAB–Munich) and the Swiss Center for Scientific Computing for the use of the DRAGON and MOLEKEL software, respectively. We employed computing facilities at the Laboratório de Química Teórica e Computacional (LQTC)–Universidade Federal do Pará.