Molecular Interactions in Chromatographic Retention: A Tool for QSRR/QSPR/QSAR Studies

Vilma Edite Fonseca Heinzen1*, Berenice da Silva Junkes2, Carlos Alberto Kuhnen3 and Rosendo Augusto Yunes1 1Department of Chemistry, Federal University of Santa Catarina, University Campus, Trindade, Florianopolis, Santa Catarina, 2Federal Institute of Education, Science and Technology of Santa Catarina, Mauro Ramos Avenue, No. 950, Center, Florianopolis, SC, 3Department of Physics, Federal University of Santa Catarina, University Campus, Trindade, Florianopolis, Santa Catarina, Brazil

electronic and/or polar features of molecules (Galvez et al., 1994;Hall et al., 1991).The molecular size, shape, polarity, and ability to participate in hydrogen bonding are among the different factors that can contribute to the physicochemical properties or biological activities of a molecule.It is well known that these factors are related to intermolecular interactions such as van der Waals forces.
The use of graph-theoretical topological indices in QSPR/QSAR studies has sparked great interest in recent years.The topological indices have become a powerful tool for predicting numerous physicochemical properties and/or biological activities of compounds, as well as for molecular design.One of the most important properties that have been extensively studied is the chromatographic retention (Estrada & Gutierrez, 1999;Ivanciuc, O. et al., 2000;Ivanciuc T. & Ivanciuc, O., 2002;Katritzky et al., 1994;Katritzky et al., 2000;Pompe & Novic, 1999;Ren, 1999Ren, , 2002a)).Quantitative structure-chromatographic retention relationship (QSRR) studies have been widely investigated by gas chromatography (GC) and highperformance liquid chromatography (HPLC) (Markuszenwski &. Kaliszan, 2002).Topological indices (TI) are obtained via mathematical operations from the corresponding molecular graphs of compounds (Ivanciuc, O. et al., 2002;Kier & Hall, 1976;Liu, S.-S. et al. 2002;Marino et al., 2002;Rios-Santamarina et al., 2002;Toropov & Toropova, 2002) in contrast to the physicochemical characterization used by traditional QSAR (García-Domenech et al., 2002).One of the main advantages of TI is that they can be easily and rapidly computed for any constitutional formula yielding good correlation abilities.However, important disadvantages should be noted including the difficulties encountered in encoding stereo-chemical information, for example, to distinguish between cis-and trans-isomers, and their lack of physical meaning.Many topological indices have been proposed since the pioneering studies by Wiener (Wiener, 1947) and by Kier on the use of QSAR (Kier & Hall, 1976).The TI developed for QSAR/QSRR studies can be illustrated by Estrada's approach to edge weights using quantum chemical parameters (Estrada, 2002) and by Ren's atom-type AI topological indices derived from the topological distance sums and vertex degree (Ren, 2002d).
Based on a chromatographic behavior hypothesis, our group developed a topological index called the semi-empirical topological index (I ET ).This index was initially developed to predict the chromatographic retention of linear and branched alkanes and linear alkenes, with the objective of differentiating their cis-and trans-isomers and obtaining QSRR models (Heinzen et al., 1999b).The excellent results achieved stimulated our group to extend the new topological descriptor to other classes of compounds (Amboni et al., 2002a(Amboni et al., , 2002b;;Arruda et al., 2008;Junkes et al., 2002aJunkes et al., , 2002bJunkes et al., , 2003aJunkes et al., , 2003bJunkes et al., , 2004;;Junkes et al., 2005;Porto et al., 2008).The equation obtained to calculate the I ET was generated from the molecular graph and the values of the carbon atoms, and the functional groups were attributed observing the experimental chromatographic behavior and supported by theoretical considerations.This was carried out due to the difficulty in obtaining a complete theoretical description of the interaction between the stationary phase and the solute.Based only on theoretical equations or hypotheses it is not possible, for example, to estimate how the molecular conformation of the solute affects the intermolecular forces.In view of this, it seems reasonable to assume that from the experimental behavior we can obtain insights regarding these factors in order to apply them to other processes involved in QSPR studies.Thus, it can be noted that the semi-empirical topological index has a clear physical meaning.
The semi-empirical topological index (I ET ) allowed the creation of a new descriptor, the electrotopological index, I SET , which was recently developed by our group and applied to QSPR studies to predict the chromatographic retention index for a large number of organic compounds, including aliphatic hydrocarbons, alkanes and alkenes, aldehydes, ketones, esters and alcohols (Souza et al., 2008(Souza et al., , 2009a(Souza et al., , 2009b(Souza et al., , 2010)).The new descriptor for the above series of molecules can be quickly calculated from the semi-empirical, quantum-chemical, AM1 method and correlated with the approximate numerical values attributed by the semiempirical topological index to the primary, secondary, tertiary and quaternary carbon atoms.Thus, unifying the quantum-chemical with the topological method provided a threedimensional picture of the atoms in the molecule.It is important to note that the AM1 method portrays more reliable semi-empirical charges, dipoles and bond lengths than those obtained from time-consuming, low-quality, ab initio methods, that is, when employing a minimal basis set in ab initio calculations.Despite the fact that the calculated partial atomic charges may be less reliable than other molecular properties, and that different semiempirical methods give values for the net charges with poor numerical agreement, it is important to recognize that their calculation is easy and that the values at least indicate the trends of the charge density distributions in the molecules.Since many chemical reactions or physico-chemical properties are strongly dependent on local electron densities, net atomic charges and other charge-based descriptors are currently used as chemical reactivity indices.
For alkanes and alkenes, this correlation allowed the creation of a new semi-empirical electrotopological index (I SET ) for QSRR models based on the fact that the interactions between the solute and the stationary phase are due to electrostatic and dispersive forces.This new index, I SET , is able to distinguish between the cis-and trans-isomers directly from the values for the net atomic charges of the carbon atoms that are obtained from quantumchemical calculations (Souza et al., 2008).For polar molecules like aldehydes, ketones, esters and alcohols, the presence of heteroatoms like oxygen changes considerably the charge distribution of the corresponding hydrocarbons, leading to a small increase in the interactions between the solute and the stationary phase (Souza et al., 2009a(Souza et al., , 2009b(Souza et al., , 2010).An appropriate way to calculate the I SET was developed, taking into account the dipole moment exhibited by these molecules and the atomic charges of the heteroatoms and the carbon atoms attached to them.By considering the stationary phase as non-polar material the interaction between these molecules and the stationary phase becomes electrostatic with the contribution of dispersive forces.These interactions were slowly increased relative to the corresponding hydrocarbons.Hence, the interactions between the molecules and the stationary phase were slowly increased as a result of the charge redistribution that occurred in presence of the heteroatom.This charge redistribution accounted for the dipole moment of the molecules.Clearly the main outcomes in terms of the charge distribution due the presence of the (oxygen) heteroatoms occur in the neighborhood, and the excess charge of these atoms leads to electrostatic interactions that are stronger relative to the weak dispersive dipolar interactions.

Semi-empirical topological index (I ET )
Three important factors led us to develop the semi-empiric topological index: (i) no topological index alone was able to differentiate between the cis-and trans-isomeric structures of alkenes; (ii) if all the carbon atoms have a value of 100 as indicated by Kovàts, from the experimental results it is not possible to determine a constant value for each of the different carbon atoms (secondary, tertiary and quaternary) of alkanes; (iii) when the Kovàts indices of retention for very branched hydrocarbons (alkanes) are correlated with the number of carbon atoms an unacceptable linearization is observed.It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase.The retention of alkanes and alkenes is due to the number of carbon atoms and the interaction of each specific carbon atom with the stationary phase.The interaction of the stationary phase with the carbon atoms is determined by its electrical properties and by the steric hindrance to this interaction by other carbon atoms attached to it.The values attributed to the carbon atoms were based on the results of the experimental chromatographic behavior of the molecules that measure the real electrical and steric characteristics of the carbons.For this reason the index is denominated semi-empirical.The representation of the molecules was based on the molecular graph theory, where the carbon atoms are considered as the vertexes of the graph and the hydrogens are suppressed (Hansen & Jurs, 1988).Thus, it is called a topological index.

Calculation of I ET for alkanes and alkenes
Values were attributed to the carbon atoms (vertex of the molecular graphs) according to the following considerations.(i) According to the Kovàts convention, the correlation between the retention index and number of carbon atoms is linear for the alkanes (Kovàts, 1968).However, branched alkanes do not present this linear relationship with the Kovàts index, since the retention of the tertiary and quaternary carbon atoms is decreased by the steric effects of their neighboring groups.It is evident that secondary, tertiary and quaternary carbon atoms have values of less than 100 u.i., as previously attributed by Kovàts.(ii) Observing the experimental chromatographic behavior, approximate numerical values were attributed: 100 u.i. for the carbon atom in the methyl group in agreement with Kovàts, 90 u.i. for the secondary carbon atoms, 80 u.i. for the tertiary and 70 u.i. for the quaternary.All values were divided by 100 to make them consistent with the common topological values.(iii) The contribution of these carbon atoms to the chromatographic retention is also dependent on the neighboring substituent groups due to steric effects.In order to estimate the steric effects, it was observed that the values for the experimental RI decreased as the branch increased, showing a log trend.Therefore, it was necessary to add the value of the logarithm of each adjacent carbon atom.Thus, the new semi-empirical topological index (I ET ) is expressed as: where C i is the value attributed to each carbon atom in the molecule and  i is the sum of the logarithm of the value for each adjacent carbon atom (C 1 , C 2 , C 3 and C 4 ) and ~ means 'adjacent to'.(iv) For alkenes, the main interaction force between the solute and stationary phase is the dispersive force, which is reduced by neighboring steric effects, however, the electrostatic force is also involved.The influence of conformational effects on the intermolecular forces makes it very difficult to predict these effects based only on theoretical considerations.For this reason, the values attributed to the carbon atom of the double bond for alkenes were calculated by numerical approximation based on the experimental retention indices, as described in our previous publication (Heinzen et al., 1999b;Junkes et al., 2002a).

Calculation of I ET for compounds with oxygen-containing functional groups
The values attributed to the carbon atoms and functional groups (vertex of the molecular graphs) were based on the following considerations: (i) For this group of compounds, the main intermolecular forces that contribute to their chromatographic behavior on low polarity stationary phases are dispersive and inductive forces.The values attributed to functional groups are also based on the experimental retention index.(ii) The -COO-(ester), C=O (ketone or aldehyde) and C-OH (alcohol) groups were considered as a single vertex of the molecular graph of the compounds studied.This was carried out due to the difficulty and the inconsistency associated with calculating the individual values of the carbon atoms and the oxygen atoms of these groups.Thus, better numerical approximations were obtained, capable of reflecting the experimental chromatographic behavior of these compounds, when these groups were treated as a single vertex.(iii) The same considerations that were taken into account during the development of the semi-empirical topological method for the prediction of retention indices of alkanes and alkenes (Heinzen et al., 1999b;Junkes et al., 2002a) were employed to develop the I ET for oxo-compounds.(iv) The contribution of the carbon atoms and functional groups to the chromatographic retention was represented by a single symbol, C i , as indicated in Equation 1.The semi-empirical topological index can be expressed by a general Equation, for the entire set of compounds included in this work, where: C i = value attributed to the -COO-(ester), C=O (ketone or aldehyde), C-OH (alcohol) groups and/or to each carbon atom, i, in the molecule. i = the sum of the logarithm of the values of each adjacent carbon atom (C 1 , C 2 , C 3 , and C 4 ) and/or the logarithm of the value of the -COO-(ester), C=O (ketone or aldehyde), C-OH (alcohol) groups, and ~ means 'adjacent to'.In a first step, an approximate I ET (I Eta ) was calculated for each compound.This was achieved using the equation previously obtained for linear alkanes containing from 3 to 10 carbon atoms and Kovàts experimental retention indices of compounds (Heinzen et al., 1999b).(v) Subsequently, the values of C i for primary and secondary carbon atoms, previously attributed to alkanes (Heinzen et al., 1999b), and the approximate I ET , calculated above, were used in Equation 1 in order to calculate the values of -COO-, C=O and C-OH groups of linear compounds.Thus, values were attributed to each class of functional group according to the position of the group in the carbon chain.(vi) One of the fundamental factors taken into consideration for the development of this topological index was the importance of the steric and other mutual intramolecular interactions between the functional group and nearby atoms.Therefore, for branched molecules, different values were attributed to carbon atoms in the , , and  position with respect to the functional groups compared to those previously attributed to alkanes (Heinzen et al., 1999b) as described in the literature (Amboni et al., 2002a(Amboni et al., , 2002b;;Junkes et al., 2003bJunkes et al., , 2004).
The values of C i for the carbon atoms and the values attributed to the functional groups of esters, aldehydes, ketones and alcohols are listed in Table 1 of Junkes et al. (Junkes et al., 2004,).

Calculation of I ET for alkylbenzene compounds
The same considerations employed in the generation of the semi-empirical topological index, I ET , for linear and branched alkanes and alkenes (Heinzen et al., 1999b;Junkes et al., 2002a) were applied to this group of compounds (alkylbenzenes).Firstly, the molecules were represented by hydrogen-suppressed molecular graphs based on chemical graph theory (Hansen & Jurs, 1988) where the carbon atoms were considered as vertexes of the molecular graph of these compounds.The contribution of each carbon atom to the chromatographic retention is represented by a single symbol, Ci, as can be observed from Eq. ( 1) where Ci is the value attributed to (=C<) fragments and/or each carbon atom i in the molecule; and δ i is the sum of the logarithm of the values for each adjacent carbon atom (C 1 , C 2 , C 3 and C 4 ).The values of Ci for the carbon atoms of linear, branched, ortho, meta and para substituted, tri-substituted and tetra-substituted alkyl benzenes can be seen in Porto et al. (Porto et al., 2008).

Calculation of I ET for halogenated aliphatic compounds
The present approach is based on the representation of molecules by hydrogen-suppressed molecular graphs which, in turn, are based on chemical graph theory, where the carbon atoms (Ci) are the graph vertexes.As with the carbon atoms the C-X and X-C-X fragments (where X = chlorine, bromine, or iodine atom) are considered a vertex of the molecular graph of these compounds, as previously considered for the functional groups (Heinzen et al., 1999b).The I ET is expressed as equation ( 1) where Ci is the value attributed to each carbon atom i and/or to C-X or X-C-X fragments in the molecule; and δi is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3, and C4) and/or the logarithm of the values of the adjacent C-X and X-C-X fragment.The values to be attributed to the carbon atoms, and to the functional group (Ci) for halogenated hydrocarbons, are calculated by numerical approximation based on the experimental retention index (RIExp) values and supported by theoretical considerations.The values of Ci for the carbon atoms of linear and branched halogenated aliphatic compounds can be obtained in Arruda et al. (Arruda et al., 2008).

Development of QSRR models using the I ET
As the starting point, the I ET was developed for alkanes on a low polarity stationary phase.These are the simplest compounds and their properties are almost completely dependent on topological features.Subsequently, this novel topological descriptor was extended to different classes of organic compounds with more complex structural features.A summary of the best simple linear regression models (RI = b + a I ET ) and the statistical data for each data set of compounds, obtained in previous QSRR studies, is given in Table 1.

The semi-empirical electrotopological index, I SET
The semi-empirical topological index (I ET ) discussed in the previous section allows the creation of a new descriptor, the electrotopological index, I SET , which was developed and applied to QSPR studies to predict the retention index, boiling points and octanol/water partition coefficient (Log P), for a large amount of organic compounds, including aliphatic hydrocarbons alkanes and alkenes, aldehydes, ketones, esters and alcohols (Souza et 2008, 2009a, 2009b, 2010).This new descriptor for this series of molecules can be quickly calculated from atomic charges obtained through the semi-empirical quantum-chemical, AM1 method (Bredow & Jug, 2005;Smith, 1996), since it was found that atomic charges correlated with the approximate numerical values attributed by the semi-empirical topological index to the primary, secondary, tertiary and quaternary carbons atoms.

Calculation of I SET for alkanes and alkenes
For alkanes and alkenes, the above-mentioned correlation allowed the creation of a new semi-empirical electrotopological index (I SET ) for QSRR models based on the fact that the interactions between the solute and the stationary phase are due to electrostatic and dispersive forces (Souza et al., 2008).This new index, I SET , is able to distinguish between the cis-and trans-isomers directly from the values of the net atomic charges of the carbon atoms that are obtained from quantum-chemical calculations.More precisely, this new semiempirical electrotopological index, I SET , was developed based on the refinement of the previous semi-empirical topological index, I ET .The values for the Ci fragments that were firstly attributed from the experimental chromatographic retention and theoretical deductions have an excellent relationship with the net atomic charge of the carbon atoms.
Thus, the values attributed to the vertices in the hydrogen-suppressed graph of carbon atoms (Ci) are calculated from the correlation between the net atomic charge in each carbon atom, which is obtained from quantum chemical semi-empirical calculations, and the Ci fragments for primary, secondary, tertiary and quaternary carbon atoms (1.0, 0.9, 0.8 and 0.7, respectively) obtained from the experimental values.This shows that it is possible to calculate a new index, I SET (the semi-empirical electrotopological index) through the net atomic charge values obtained from a Mulliken population analysis using the semiempirical AM1 method and their correlation with the values attributed to the different types of carbon atoms.This demonstrates that the I SET encodes information on the charge distribution of the solute which drives the dispersive and electrostatic interactions between the solute (alkanes and alkenes) and the stationary phase (Souza et al., 2008).
Since the interactions between the solute and the stationary phase are dispersive for alkanes and electrostatic for alkenes, the chromatographic retention is strongly dependent on the electronic charge distribution of each carbon atom of these molecules.A simple linear regression equation was obtained between the values of the carbon atoms, SETi values, based on experimental gas chromatography retention (for primary (1.0), secondary (0.9), tertiary (0.8) and quaternary (0.7) carbon atoms) and the net atomic charges (δi) of these atoms, as given in Equation ( 2).
This indicates that the physical reality encoded by the semi-empirical topological index (I ET ) developed in our laboratory is completely related to net atomic charges which, as is well known, are important forces in intermolecular interactions.It is clear that the interactions between the non-polar stationary phases and the different compounds were determined predominantly through the electronic charge distribution of the molecular structures of the compounds analyzed by gas chromatography.From Equation (1) it is clear that knowledge of the net atomic charges is sufficient to calculate the SETi value for all kinds of carbon atoms and not only the values given by the carbon models (that is 1.0, 0.9, 0.8 and 0.7) or in specific tables.Hence, the above method of calculating the SETi values of the carbon atoms allows a new index to be created, denominated the semiempirical electrotopological index, I SET .Considering the steric effects of the neighboring carbon atoms, as was observed in the calculation of I ET , this new index can be calculated according to Equation (3).
In the above expression the i sum is over all the atoms of the molecule (excluding the H atoms) and the j is an inner sum of atoms attached to the i atom.The cis-2-pentene and trans-2-pentene molecules represented in the graph below are taken as an example of the I SET calculation.
The net atomic charges and SET i values for the above molecules are given in As expected on physical-chemical grounds, the AM1 calculation reveals that the optimized structures of the cis-and trans-isomers have slightly different charge distributions.As can be seen from the above results, the Mulliken population analysis gives the net atomic charges of the carbon atoms for each isomer, which implies that the difference in the values for the SETi fragments is sufficient to give different I SET values.

Ketones and aldehydes
For polar molecules like aldehydes, ketones, esters and alcohols, the presence of heteroatoms like oxygen changes considerably the charge distribution of the corresponding hydrocarbons giving a small increase in the interactions between the solute and the stationary phase.An appropriate way to calculate the I SET was developed that takes into account the dipole moment exhibited by these molecules and the atomic charges of the heteroatoms and the carbon atoms attached to them (Souza et al., 2009a).By considering the stationary phase as non-polar material, the interactions are slowly increased relative to the corresponding hydrocarbons due to the charge redistribution that occurs in presence of the heteroatom.This charge redistribution accounts for the dipole moment of the molecules.Thus, the dipolar charge distribution in such molecules leads to a small increase in the interactions of the solute with the stationary phase relative to hydrocarbons where the dipole moment is zero, or almost zero.Clearly the major effects on the charge distribution due the presence of the (oxygen) heteroatoms occur in the neighborhood and the excess charge of these atoms leads to electrostatic interactions that are stronger relative to the weak dispersive dipolar interactions (Christian, 1990).
In relation to the chromatographic retention it can be observed, for instance, that the molecules 2-hexanone, 3-hexanone and hexanal have experimental retention indices of 767, 764 and 776, respectively, and for the corresponding hydrocarbon molecule in the absence of the heteroatom, that is, the heptane, the retention index is 700.Due to the presence of the heteroatom (oxygen) there is an increase in the retention index of around 10%.Hence, the interactions between the molecules and the stationary phase are slowly increased and clearly this is due to the charge redistribution that occurs in the presence of the heteroatom.This charge redistribution accounts for the dipole moment of molecules like aldehydes and ketones.The dispersive force between these kinds of molecules and the stationary phase includes the charge-dipole interactions and dipole-induced dipole interactions which are weak relative to the electrostatic interactions.Thus, the dipolar charge distribution in such molecules leads to a small increase in the interactions of the solute with the stationary phase relative to hydrocarbons where the dipole moment is zero.Initially, it appears that the above-mentioned factors mean that the retention index can be calculated as in equation 3, and the same applies to the heteroatoms, but including subtle alterations that incorporate the effects of the dispersive dipolar interactions.
All of these factors can be included in the calculation of the retention index through a small increase in the SET i values for the heteroatoms and the carbon atoms attached to them.This was carried out by multiplying the SET i values of these atoms by a function A µ which is dependent on the dipole moment of the molecule and the net charge at the oxygen and carbon atoms (to include both the electrostatic and dispersive interactions).Since we must have A µ = 1, when the dipole moment is zero or almost zero (as in the case of alkanes and alkenes) in a first attempt to achieve this function a linear dependence on the molecular dipole moment µ is considered, that is, A µ = 1 + (µ /µ F ), where µ F is a local function (in the units of the dipole moment) in the sense that it is dependent on the net charge of oxygen and carbon atoms.On the one hand this definition of A µ works only if µ/µ F > 1, since A µ must reflect the small increase in the interactions due to dipolar dispersive forces.On the other hand good choices for the definition of µ F for ketones and aldehydes (as we shall see below) means that the ratio µ/µ F can be much greater than unity showing clearly that it is not possible to apply the above definition to A µ .Considering that µ/µ F > 1 then A µ can not be a polynomial function of µ/µ F .Thus, A µ must have a weaker dependence on the dipole moment than the linear one and this weak dependence can be achieved through a logarithmic function since it is clear that the function f(x) = x increases much faster than the function f(x) = log (1+x).Taking these factors into account it is possible to achieve a definition of A µ that differs slightly from unity and is logarithmically dependent on the dipole moment of the molecule, as seen in equation 4 where µ is the calculated molecular dipole moment and µ F is a local function which is dependent on the charges of the atoms belonging to the C=O bond.Clearly, µ F must be directly related to the net charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these molecules and the stationary phase.In this regard, µ F may also be related to the atomic charge of the carbon atom of the functional group C=O or related to the difference between the atomic charges of these atoms.Hence, µ F can be defined in different ways and some definitions of µ F can be used in preliminary calculations.As expected, after some preliminary calculations, the best choice was for ketones µ F = d|Q C -Q O | where d is the calculated C=O bond length and |Q C -Q O | is the absolute value of the difference between the atomic charges at the carbon and oxygen atoms.This definition of µ F is an attempt to take into account the contribution of the atomic charge of the oxygen atoms and the respective bonded carbon atom to the electrostatic interactions.
For aldehydes the terminal carbon atom of the C=O bond is attached to a hydrogen and thus it is necessary to consider the net positive charge in this polar region of the molecule as the sum of the atomic charges of the carbon and hydrogen atoms.This means that for aldehydes the best choice for µ F was µ F = d|Q C + Q H -Q O |.Therefore, equation 4 indicates that there is an increase in the interaction between the molecules and the stationary phase due to the presence of the dipole moment and that this contribution may be screened by the charge located on the heteroatoms (oxygen atoms) if µ/µ F < 1, or may be increased if µ/µ F > 1.In the case of ketones and aldehydes the local function µ F is less than the dipole moment showing that A µ receives an appreciable contribution from the atomic charges of these atoms.This reveals the contribution of oxygen to the electrostatic interaction between the solute and the stationary phase.Therefore, to include the dispersive dipolar interactions in the calculation of the retention index we multiply the SET i values for the heteroatoms (oxygen) and the carbon atoms attached to them by the dipolar function A µ given in equation 4. That is, in this model the I SET is calculated as in equation 5 where the SET i values are obtained using equation 2. As in equation 3, in the above expression the i sum is over the all the atoms of the molecule (excluding the H atoms) and the j is an inner sum of the atoms attached to the i atom.In the above expression, for the I SET the dipolar function A µ is taken as unity for the remaining carbon atoms of the molecules.Equation 4 reduces to equation 2 when the dipole moment of the molecule is zero or almost zero, as is the case for alkanes and alkenes since A µ = 1 for µ = 0.
The 3-hexanone and hexanal molecules represented in the graph below are taken as an example of the I SET calculation.
3-hexanone hexanal Q H = +0.086The net atomic charges and SET i values are given in Table 3  The I SET calculation now follows:

Esters
For esters the major effects related to the charge distribution are due to the presence of the two oxygen atoms and they occur on these atoms and in their neighborhood (their adjacent carbon atoms).The excess charge of these atoms leads to electrostatic interactions that are stronger than the weak dispersive dipolar interactions.For esters, all these factors were included in the calculation of the retention index through a small increase in the SET i values for heteroatoms and the carbon atoms attached to them (Souza et al., 2009b).
As in the case of ketones and aldehydes, it was verified that the introduction of the dipole moment of the molecule is not sufficient to explain the chromatographic behavior of these molecules.Thus, it was necessary to introduce an equivalent local dipole moment of the (-COOC-) group that contributes to the increase in the retention value.This was carried out by multiplying the SET i values of the atoms belonging to the O=C-O-C group by the function A µ which is dependent on the dipole moment of the molecule and the net charge of the oxygen and carbon atoms (to include both the electrostatic and dispersive interactions).The same approach used for ketones and aldehydes was applied to esters, that is, considering that A µ has a weaker dependence on the dipole moment than the linear one, as given in equation 4. For esters µ F is an equivalent local dipole moment (in the units of dipole moment) which is dependent on the charges of the atoms belonging to the O=C-O-C group.Clearly µ F must be directly related to the net charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these atoms and the stationary phase.In this regard, µ F may also be related to the atomic charge of the carbon atom of the functional group C=O or related to the difference between the atomic charges of these atoms.Hence, µ F can be defined in different ways and some definitions of µ F can be used in preliminary calculations.
Esters have two oxygen atoms and thus it is possible to define two local functions, one being dependent on the charges and bond length of the C=O 1 bond and another on the charges and bond length of the C-O 2 bond.Therefore, it was necessary to perform some calculations with different definitions for the equivalent local dipole moment.After the preliminary calculations it was found that for esters the charge difference, Q O -Q C , does not give reasonable results because the charges of the oxygen atoms mask the charge of the carbonyl carbon.As expected, our best choice was for the esters µ ).The equivalent local dipole moment is then calculated as the magnitude of the vectorial sum of two dipole moments, that is, µ F1 = (µ 2 F1 + µ 2 F2 + 2 µ F1 µ F2 cosθ) 1/2 , where θ is the angle between the C=O 1 and C-O 2 bonds.For formates, a specific charge distribution occurs in the polar region of the molecules and the best mathematical model for the local moment was that which takes into account the contribution to the electrostatic interactions that originate from the atomic charges of the oxygen atoms, the carbon atoms and the H atom belonging to the C 1 O 1 O 2 C Al group of the formate molecules (C Al represents the carbon on the alcoholic side).Thus, the equivalent dipoles were built from the net charges of the HC 1 O 1 , HC 1 O 2 and O 2 C Al groups of atoms.The equivalent dipoles associated with these net charges are: where d 1 and d 2 are the calculated C 1 =O 1 and C 1 -O 2 bond lengths and d 3 is the calculated C Al -O 2 bond length.In a first approach, the local moments µ F2 and µ F3 are considered to be collinear and another equivalent dipole is obtained from the difference between µ F2 and µ F3 , that is, µ F4 = µ F2 -µ F3 and the final equivalent local moment is calculated as above, that is, µ F = (µ 2 F1 + µ 2 F4 + 2 µ F1 µ F4 cosθ) 1/2 where θ is the angle between the C=O 1 and C-O 2 bonds.Hence, for formates the charge of the hydrogen atom attached to the carbon atom of the COO functional group is also considered, as in the case of aldehydes, because the charge of the H atom contributes explicitly to the positive charge of the local polar region of the molecule.The above-mentioned best definitions for µ F imply that the present approach to calculating the retention index considers important polar features of the organic functions, such as ketones, aldehydes and esters, through the information carried by the local moment µ F .In other words, according to Equation ( 4) there is an increase in the interaction between the molecules and the stationary phase due to the presence of a dipole moment and this contribution may be screened by the charge located on the heteroatoms and the carbon atom of the C=O group if , µ F > µ or may be increased if µ F < µ.In the case of esters, the local function µ F is less than the dipole moment showing that A µ has an appreciable contribution from the atomic charges of those atoms.This verifies, for esters, the contribution of the oxygen atom to the electrostatic interaction between the solute and the stationary phase.
Therefore, in the case of esters the I SET value is here calculated as in Equation ( 5), where the SET i values are obtained using Equation ( 2) through AM1 calculations of the net atomic charges.As mentioned above, Equation ( 5) is calculated by multiplying the SET i values of the atoms belonging to the C 1 O 1 O 2 C Al group by the dipolar function A µ which is taken as unity for the remaining carbon atoms of the molecules.Hence, Equation ( 5) is a general definition for the electrotopological index that can be applied to different organic functions, which are specified through appropriate definitions of the equivalent local moment µ F .The preliminary applications of I SET a s g i v e n b y E q u a t i o n ( 5 ) s h o w e d t h a t t h i s e x p r e s s i o n overestimates the calculated retention index for branched esters and underestimates the results for methyl esters.This finding reveals the need to consider other definitions for the local moment µ F for branched esters and methyl esters.However, another easy choice is to take into account the steric effects for the branched esters and methyl esters.The simplest way to do this is to consider the steric hindrance of the C Al carbon atom of the C 1 O 1 O 2 C Al group and the carbon atom attached to the acid side of the COO functional group (here named the C Ac carbon).As seen in Equation ( 2), the log SET j factor gives, precisely, the steric effect of atom j.Thus, to include a steric correction (sc) in Equation ( 5) for branched esters the term sc = n logSET(C AC ) + n logSET(C A1 ) was added, where n is the number of branches of the ester.On the other hand, for methyl esters the C Al carbon is bound to three H atoms and it is necessary to remove the overestimated steric effects of the logA µ SET j terms in Equation ( 5).For methyl esters this is easily achieved by including a second steric correction (ssc) by adding the term ssc = -log SET(C A1 ) to equation (5).Very good results were obtained using this approach, which reveals that in this model the complex steric effects in branched esters can be included simply by considering the steric hindrance using the net charge (through the SET i values) of the two carbon atoms bound to the alcoholic and acid sides of the COO functional group.The calculation of I SET for a large amount of molecules is easily carried out by means of a FORTRAN code developed in our lab that calculates I SET by reading the output data (calculated net charges, dipole moment and atomic positions) from AM1 semi-empirical calculations.

Alcohols
As observed for the preceding compounds, for alcohols the major effects on the charge distribution are due the presence of the oxygen atom and they occur at the site of and close to their neighbors (adjacent carbon atoms).The excess charge at these atoms leads to electrostatic interactions that are stronger than the weak dispersive dipolar interactions.Thus, it is clear that it is necessary to introduce an equivalent local dipole moment for each of the organic functions that participate in increasing the retention value.For alcohols, as in the case of ketones, aldehydes and esters, this was achieved by multiplying the SET i values of the atoms belonging to the C-OH group by a function A µ as defined by equation 4, with µ F being the equivalent local dipole moment which is dependent on the charges of the atoms belonging to the C-OH group (Souza et al., 2010).Clearly, µ F is directly related to the net atomic charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these atoms and the stationary phase.Thus, µ F may also be related to the atomic charge of the carbon atom of the functional group C-OH or to the difference between the atomic charges of these atoms.Hence, as with the other organic functions, µ F can be defined in different ways and some of these definitions can be used in the preliminary calculations.For primary alcohols the terminal carbon atom of the C-O bond is attached to two hydrogen atoms and thus it is necessary to consider the net positive charge in this polar region of the molecule as the sum of the atomic charges of the carbon and hydrogen atoms.Thus, for primary alcohols we found that the best definition of the local moment is related to the charges of all atoms at the polar head of the molecules, that is, is the absolute value of the difference between the charge of the carbon atom and oxygen atom attached to it.These definitions of µ F attempt to take into account the contribution to the electrostatic interactions originating from the polar region of the molecules.Therefore, this shows again that Equation 4 represents a dipolar contribution to the interactions between the molecules and the stationary phase (which originates from the presence of a molecular dipole moment) and this contribution is decreased by the charge of the heteroatoms (oxygen atoms) when µ F > µ, or increased when µ F < µ.This reveals the contribution of oxygen to the electrostatic interaction between the solute and stationary phases.
For alcohols, the I SET values are calculated as in Equation (5), where the SET i values are obtained using Equation (2) through AM1 calculations of the net atomic charges.

Development of QSPR models using the I SET
The molecular descriptor I SET was developed first for alkanes and alkenes on a low polarity stationary phase and then extended to oxo-compounds through the inclusion of the molecular dipole moment and a local dipole moment in its definition.The models for the best simple linear regression between the retention index and the molecular descriptor (RI = b + a ISET) and the statistical data for each class of compounds, obtained in previous QSRR As can be seen from Table 4, the QSRR models for 179 representative linear and branched alkanes and alkenes, obtained with the I SET using the net atomic charge to calculate more precisely the Ci fragment values of I ET , were of good quality for the statistical parameters obtained.This new descriptor I SET contains information on the 3D features of molecules, and discriminates between geometrical isomers, such as cis-and trans-alkenes, and between conformers, and the elution sequence is correct for the majority of the compounds.
The results obtained for aldehydes and ketones are similar to those reported by Ren (Ren, 2003) in multiple linear regression models for 33 aldehydes and ketones using Xu and AI topological indices and by Héberger and co-workers using quantum-chemical descriptors (SW and µ) (Héberger et al., 2001) and physico-chemical properties (T Bp , M W , log P) (Héberger et al., 2000) for 31 and 35 compounds, respectively.For esters the results obtained by single linear regression using the I SET are better than those reported by Lu et al. (Lu et al., 2006).For SE-30 and OV-7 stationary phases the results are also better than those found by Liu et al. (Liu et al., 2007) and for more polar stationary phases the statistical parameters differ only slightly.Both of these studies use multiple linear regression (MLR) between RI and the topological indices for 90 saturated esters on stationary phases with different polarities.
Several authors have developed QSRR models, based on MLRs, to predict the RI values for saturated alcohols.For example, Guo et al. (Guo et al., 2000), using the MLR analysis and artificial neural networks technique, obtained the statistical parameters r 2 =0.9982,SD=8.21,N=19 for an SE-30 stationary phase.In a previous study, the best statistical parameters of the MLR models obtained by Farkas and Héberger (Farkas & Héberger, 2005), employing four molecular descriptors, were r=0.9804,SD=14.22,r 2 CV =0.9801 and N=44 for an OV-1 stationary phase.Therefore, our prediction results, on low polarity stationary phases, using the I SET as a single descriptor, showed statistical quality comparable to similar studies reported by the above authors.Furthermore, the statistical parameters of the present approach have a good agreement with those obtained for alkanes and alkenes, aldehydes and ketones, for saturated esters and for alcohols using the semi-empirical topological index, I ET , previously developed.These results show clearly that I SET is a molecular descriptor that embodies in an appropriated manner the net atomic charges and charge distribution of molecules since the retention index embodies the intermolecular interactions between the stationary phase and the molecules.
The fact that properties that are determined by intermolecular forces can be adequately modeled by the I SET descriptor can be easily seen in its relationship with the boiling point (BP).For alcohols a good correlation was obtained through a simple linear model (BP = b + a I SET ), as can be seen in The results in Table 6 indicate that the theoretical partition coefficients calculated using the I SET method give good agreement with the experimental partition coefficients.The QSPR models obtained with I SET showed high values for the correlation coefficient (r > 0.99), and the leave-one-out cross-validation demonstrates that the final models are statistically significant and reliable (r 2 cv > 0.98).As can be observed, this model explains more than 99% of the variance in the experimental values for this set of compounds.Among the various classes of compounds the best results obtained with the I SET method are for hydrocarbons (Table 6), which is related to the fact that the present model was developed initially for this class of organic compounds.As can be seen in Table 6, the lowest standard deviation was obtained for the correlation of aldehydes and for alcohols the correlation was stronger.The range of standard deviations obtained verifies the applicability of the present approach to different classes of organic compounds.For alcohols, the earlier approach of Duchowicz et al. (Duchowicz et al., 2004), based on the concept of flexible topological descriptors and on the optimization of correlation weights of local graphic invariants, is applied to model the octanol/water partition coefficient of a representative set of 62 alcohols, resulting in a satisfactory prediction with a standard deviation of 0.22.Recently, Liu et al. (Liu et al. 2009) carried out a QSPR study to predict the log P for 58 aliphatic alcohols using novel molecular indices based on graph theory, by dividing the molecular structure into substructures obtaining models with good stability and robustness, and values predicted using the multiple linear regression method are close to the experimental values (r = 0.9959 and SD = 0.15).The above results show the reliability of the present model calculation based on the semi-empirical calculation of atomic charges and local dipole moments using only one descriptor, I SET .This new approach to polar molecules, with the introduction of the remodeled I SET index including the contribution of the dipole moment of the molecule and an effective local dipole moment associated with the net charges of the atoms of the carbonyl group, opens new possibilities for studies on the chromatographic and other properties of different organic functions.

Conclusions
It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase.These forces are called van-der-Waals forces since van der Waals recognized them as the reason for the non-ideal behavior of the real gases.Intermolecular forces are usually classified according to two distinct categories: i) the first category corresponds to the directional, induction and dispersion forces which are non-specific; and ii) the second group corresponds to hydrogen bonding forces and the forces of charge transfer or electron-pair donor-acceptor forces which are specific.
In the development of the semi-empirical topological index (I ET ) it was considered that the retention of alkanes is due to the number and interaction of each specific carbon atom with the stationary phase, considered as non-polar, which is determined by its electrical characteristic and by the steric hindrance by other carbon atoms attached to it.In this case only dispersion forces due to the continuous electronic movement, at any instant, result in a small dipole moment which can fluctuate and polarize the electron system of the neighboring atoms or molecules.For the alkenes, some carbon atoms with greater electronegativity give the molecules a dipole moment and for this reason besides the dispersion forces, electrostatics forces play an important role.However, in this method the behavior of this kind of carbon atom is determined from the experimental data and indicated in specific tables.As the values were obtained from the experimental data they encode the real physical interaction force.In the case of oxo-compounds, the presence of atoms with different carbon atom electronegativity introduces a dipole moment in the functional group and a change in the dipole moment of the whole molecule.These factors were considered in order to obtain the different values for the functional groups and they were able to encode the physical force involved in the chromatographic separation.
The new semi-empirical electrotopologiocal index (I SET ) demonstrated that the values for the carbon atoms that are not tetrahedral and functional groups (considering the new local dipole created by the heteroatom) can be calculated from the net atomic charges that are obtained from quantum-chemical calculations.In the case of esters, the major effects are due to the presence of the two oxygen atoms and their adjacent carbon atoms.As in the case of and ketones it was verified that the introduction of the dipole moment of the molecules is not sufficient to explain the chromatographic behavior.Thus, it was necessary to introduce an equivalent local dipole moment of the ester group that contributes to the increase in the retention value.In the case of esters two local functions must be considered according to the charges and the bond lengths of the C=O and C-O bonds.Thus, the semiempirical electrotopological index was developed based on the refinement of the previously developed semi-empirical topological index, unifying the quantum-quantum chemical with the topological method to provide a three-dimensional picture of the atoms in the molecule.
The I ET and I SET were generated to predict the chromatographic retention indices and other physical-chemical properties and to obtain the quantitative structure-retention relationship (QSRR/QSPR).The efficiency and the applicability of these descriptors were demonstrated through the good statistical quality and high internal stability obtained for the different classes of compounds studied.
F1 = d 1 |Q O1 | and µ F2 = d 2 |Q O2 |, where d 1 and d 2 are the calculated C 1 =O 1 and C 1 -O 2 bond lengths and |Q O1 | and |Q O2 | are the absolute values of the atomic charges of the oxygen atoms (O 1 and O 2 where d is the calculated C-O bond length and |Q C +(Q H1 +Q H2 )/2 -Q O -Q HO | is the absolute value of the difference between the net atomic charge at the carbon (Q C ) plus the average charge of the hydrogen atoms attached to it (Q H1 + Q H2 )/2 and the charges of the oxygen atom (Q O ) and the hydrogen attached to it (Q Ho ).For secondary, tertiary and quaternary alcohols the best choice for the local moment is related to the net atomic charge of the C and O atoms only, that is, µ F = d|Q C -Q O |, where d is the length of the C-O bond and |Q C -Q O |

Table 1
. Summary of the best simple linear regressions (RI Calc = a + b I ET ) found for different data set on low polarity stationary phases.

Table 2 .
Table 2 below.The net atomic charge (i) and the SET i values for each carbon atom of cis-2-pentene and trans-2-pentene molecules.

Table 3 .
below.The net atomic charge (i) and the SET i values for each carbon and oxygen atom of 3-hexanone and hexanal molecules.

Table 4 .
For esters and alcohols good correlations between the retention index and the I SET were obtained also for stationary phases with different polarities (not included in Table4).The good statistical results achieved (Table4) employing I SET are better or equivalent to those obtained using multiple linear regression employing many molecular descriptors.

Table 4 .
Summary of the best simple linear regressions (RI Calc = b + a I SET ) found for different classes of compounds on low-polarity stationary phases.

Table 5 .
The QSPR model obtained for the experimental BP of 134 compounds showed high values for the coefficient of determination and cross validation coefficient showing the good predictive capacity of the model.

Table 5 .
(Ren, 2002b)ents and statistical parameters for linear regression between experimental boiling point and I SET .This model can explain 98.20% of the variances in the experimental values and most of these compounds (N=101) are not included in the initial model used to build the SET , showing the external stability of the model.These results are similar to those obtained using I ET for 146 aliphatic alcohols and can be compared with those obtained by Ren(Ren, 2002b), but using MLR models, for 138 compounds with five descriptors.The octanol-water partition coefficient (log P) of compounds, which is a measure of hydrophobicity, is widely used in numerous Quantitative Structure-Activity Relationship (QSAR) models for predicting the pharmaceutical properties of molecules.The partition coefficient is a property that is determined by intermolecular forces and thus it is expected that it can be described by a molecular descriptor such as I SET .The results obtained in the statistical analysis of the single linear regression between experimental log P values and I SET are shown in Table6for each class of compounds.

Table 6 .
The coefficients and statistical parameters for linear regression between experimental log P values and I SET .