Adjustments for training and validation phase for the RS and ANN models selected by the authors . Determination coefficient (R2) and root mean square error (RMSE) for the models developed by surface (RS) and neural models (ANN). The subscript
Two types of predictive models based on artificial neural networks (ANN) and quadratic regression model developed in our laboratory will be summarized in this book chapter. Both models were developed to predict the density, speed of sound, kinematic viscosity and surface tension of amphiphilic aqueous solutions. These models were developed taking into account the concentration, the number of carbons and the molecular weight values. The experimental data were compiled from literature and included different surfactants: i) hexyl, ii) octyl, iii) decyl, iv) tetradecyl and v) octadecyl trimethyl ammonium bromide. Neural models present better adjustment values, with R2 values above 0.902 and AAPD values under 2.93% (for all data), than the quadratic regression models. Finally, it is concluded that the quadratic regression and the neural models can be powerful prediction tools for the physical properties of surfactants aqueous solutions.
- physical properties
- artificial neural network
Amphiphilic compounds have a well-defined structure; two parts clearly differentiated that will determine the behavior in aqueous systems  and is the key factor to their relationship with the internal and the external interfaces in aqueous systems . One part of the amphiphilic compound is hydrophilic and the other part is hydrophobic [1, 2] and both are linked by a covalent bond .
In aqueous systems, the most important application of surfactants (in volume and economic impact terms), generally a long-chain hydrocarbon group is used as the hydrophobic group (although i) fluorinated, ii) oxygenated hydrocarbon or iii) siloxane chains can also be used) and an ionic or highly polar group as a head or hydrophilic group . The different types of amphiphilic molecules can be differentiated according to the bonds between their two parts, hydrophilic and hydrophobic . For example i) a hydrophilic head can be covalently bound to hydrophobic alkyl chain, whether single, double, or triple, also, ii) an amphiphilic bolaform is formed by two hydrophilic heads covalently linked with a hydrophobic alkyl chain and iii) a Gemini amphiphile is two surfactants covalently linked by their charged heads . These compounds can be also classified based on the chemical nature of their hydrophilic group with subgroups according to the tail, so that, four basic categories can be defined: i) anionic, ii) cationic, iii) nonionic and iv) amphoteric (and zwitterionic) .
The property of amphiphiles to self-assemble in aqueous solution to design well-defined structures makes them become interesting molecules that can be applied in different fields  such as:
Pharmaceutical to overcome: i) the important manufacturing costs, ii) the poor pharmacokinetic characteristics and iii) the low bacteriological efficiency of the natural cationic antimicrobial peptides (AMPs), using novel and diverse cationic amphiphiles that can mimic the AMPs amphiphilic topology , or even as anti-cancer drug delivery vehicles using block copolymer micelles (poly(ethylene oxide) and poly(L-amino acid)) ,
in the cleaning sector, where they were used to clean oily deposits from solid surfaces using mixed solutions of fatty acid sulfonated methyl esters and using as cosurfactant dodecyldimethylamine oxide . Yavrukova et al.  study the cleaning process of porcelain and stainless steel and concluded that the SME mixtures can be a hopeful system for formulations in household detergency,
in Chemistry, where this kind of molecules are studied as a developer of supramolecular nanotubes architectures ,
in Medical Science to accelerate wound healing using antioxidant shape amphiphiles , or
As previously said, these kinds of molecules can form different types of aggregates. These structures are formed when a certain concentration, called critical micelle concentration (cmc), is reached. This parameter can be defined as the specific concentration for a particular surfactant at which determinate solution properties change strongly . According to Myers , different authors showed that the aggregated structure type depends on what is known as critical packing parameter. This parameter (CPP = v/aolc) establishes the relationship between the volume of the hydrophobic part of the molecule (v), the optimal area of the head group (a
According to Gómez-Diaz et al. [1, 12], different physical properties have been used to characterize the aggregation processes by means of measured different experimental values. These authors have been demonstrate that density and kinematic viscosity do not alter when the micellization point is reached so that they are not utilized to determine knowledge about the behavior of the colloidal aggregate [1, 12]. On the other hand, the variation of the rest of the measured properties, speed of sound and surface tension, can be used to determine the cmc value. The property variation can give rise to the existence of two trend lines which intersection can be used to determine the cmc [1, 12]. As claimed by Gómez-Díaz et al. [1, 12], the cmc value, using the surface tension and the speed of sound was similar. Nevertheless, the cmc value using the surface tension was, for the hexyl, octyl and decyl trimethyl ammonium bromide, a bit lower than when the speed of sound is used [1, 12] (which can be attributed to the effect of small impurities amounts upon the surface tension value) .
The study of solutions behavior to know its properties, and to be able to calculate the cmc, required a lot of work, time-consuming and material cost. Due to these facts, modeling the physical properties of these solutions could help to reduce material and time costs. Thus, the study of methodologies such as artificial neural networks (ANN) and response surface (RS) are interesting and due to this in our research group, a study about this possibility were carried out by Astray & Mejuto .
On the one hand, and regarding response surface methodology, it was firstly described by Box and Wilson in 1951 [13, 14]. The RSM is used as a tool for optimization tasks by relating the variables of the process and its response [15, 16]. The experimental data could be fit to a polynomial equation which must describe the data behavior to achieve statistical previsions , therefore, this methodology is based on the development of empirical mathematical models to describe the system under study . These models can be used when the response, or responses, are influenced by different variables . An RSM model can work with a reduced amount of experimental trials and can be used to develop, improve and optimize different process . The RSM can use a set of mathematical and statistical tools to fit the experimental data to an Equation , usually, linear or square polynomial functions [17, 18]. Different experimental designs could be used which randomizes the experimental error and equals the experimental points distribution, for de independent variables, in the range investigated . RSM models can be applied in different areas such as:
Chemical Engineering to extract alumina from coal fly ash optimizing different variables involved in the process (K2S2O7/Al2O3 molar ratio, calcining temperature and calcining time) ,
in Environmental Science to study the biodegradation of the strobilurin fungicide Pyraclostrobin using bacteria from orange cultivation plots to develop a bioremediation method ,
in Biomedical applications to extract anthocyanins from blueberry optimizing the ultrasonic time, ultrasonic temperature, freezing time and liquid–solid ratio  or in
in Biotechnology to optimize the culture media and reduce the production cost of urease bacteria to achieve an eco-friendly process controlling different parameters (yeast extract, whey and heating temperature) , inter alia.
On the other hand, artificial neural networks are computational modeling tool that consists of a set of simple processing elements (neurons), massively interconnected capable to process data . This kind of models can try to simulate the path in which the human brain process the information, that it is, ANN are inspired in the biological system . ANN is made up of different neurons layers: an input layer to receive the information, one or more intermediate (or hidden) layers where the information is processed, and an output layer, with one or more neurons, where the predicted value is generated (Figure 2). Each neural network is characterized by a specific topology or architecture. To facilitate your identification each neural model implemented can be named such as i-h-o, using the number of neurons presented in the input (i), hidden (h) and output (o) layer .
These models present different advantages such as: are non-linearity systems that allow better data fit, are non-sensitivity to noise (uncertain data and measurement errors), present high parallelism (fast processing and failure-tolerance), among others . According to Baş and Boyaci , ANNs represent non-linearities better than RS, although ANNs cannot produce a similar model equation to RS models. This kind of approach can be used in a multitude of fields such as:
in Food Technology to determine the botanical origin of honey using different parameters (ashes content, electrical conductivity, among others)  or food authenticity  (carried out in our laboratory),
in Renewable Energy to predict three components of solar irradiation in Odeillo (France) , or
This book chapter summary the quadratic regression and neural models developed in our research group  to predict, for amphiphilic aqueous solutions, the i) density (
2. Material and methods
2.1 Artificial neural networks as an approximation approach
Artificial Intelligence models based on artificial neural networks have been widely used in the area of chemistry to model and predict processes related to physical properties. This type of model has shown great reliability to model and predict density, dynamic viscosity, and surface tension, among others.
A good example of the use of artificial neural networks to determine properties of interest in micellar systems is the research carried out by Katritzky et al.  who developed a model to predict the critical micellar concentration of non-ionic surfactants based on different parameters related to its molecular structure. According to the authors, the models developed could be used for prediction or analysis of new non-ionic surfactants similar to those used in this research. On the other hand, Fatemi et al.  developed a model based on artificial neural networks to predict the critical micellar concentration of different anionic and cationic compounds. The selected input variables included the Balaban index, the heat of formation, among others. The results obtained were compared with the predictions of a multiple linear regression model and it was shown that the neural network is superior to multiple linear regression model to predict the log CMC of anionic and cationic surfactants. Along the same line, Kardanpour et al.  reported a wavelet neural network (WNN) to predict the critical micellar concentration of Gemini surfactants. The developed model used twelve different descriptors from the molecular structure. According to the authors, the results reveal the ability of the model to determine CMC and demonstrate, in comparison with MLR models, that the models based on neural networks are superior to the MLR approach (due to the ability of the WNN model to work with nonlinearities between the input variables and the CMC).
The researchs listed above demonstrate the ability of artificial neural networks to predict the critical micellar concentration of different surfactants. But to predict the value of this CMC, it is necessary to carry out different experimental studies to determine any particular property that allows determining the CMC value as a function of some abrupt variation of that property. Two of these properties are surface tension and speed of sound whose experimental work requires a great deal of work time and expense in labour and reagents. The different experiments carried out for each variable would determine the CMC as a function of the intersection of the two trend lines (as mentioned above [1, 12]). Due to these facts, an ANN approach could be very useful to lower costs and be able to make approximations easier, so designing models that are capable to predict this variable depending on the different mixtures could be a very recommendable tool. The claim that artificial neural networks are useful tools because they can minimize the time of experimental treatment and operating costs can be contrasted in different studies reported in the bibliography. An example of this, is the study carried out by Belhaj et al.  in which they use artificial neural networks to predict absorption values for alkyl ether carboxylate (AEC) and alkyl polyglucoside (APG). Thus, this book chapter summary the research carried out in our research group to predict density, speed of sound, kinematic viscosity and surface tension of amphiphilic aqueous solutions .
In addition to our work, surface tension modeling was also carried out by different authors. Khazaei et al.  developed an ANN to predict the surface tension of multicomponent mixtures at different temperatures was employed. The input variables were: reduced temperature, critical pressure and volume, and an acentric factor of the mixture. The obtained average absolute relative deviations were low and the ANN model, compared with well-known models (Brok-Bird equation, Flory theory and group contribution theory) has proved a high prediction capacity. The authors concluded that ANN can be helpful for engineering calculations and they emphasized that the ANN model can be a robust approach to predict complex input–output systems. Other interesting research was carried out by Gharagheizi et al.  developed neural models to determine the surface tension of pure compounds at different temperatures and atmospheric pressure. The authors investigated compounds belonging to 78 different chemical families and the results were satisfactory (according to different statistical parameters) with an absolute average deviation of 1.7% and a squared correlation coefficient of 0.997. On the other hand, Bakeri et al.  used 20 hydrocarbons mixtures to determine the surface tension. The model developed by the authors showed the best accuracy when they are compared with other four well-known classical models. On the other hand, density and kinematic viscosity, in this case, for different systems of biofuels and their blends with diesel fuel, can be predicted using ANNs . In this case, two artificial neural networks were developed to predict kinematic density and viscosity. The models developed used 6 input variables, temperature, volume fractions, among others. The results reported by the authors indicate that the models obtained good correlations. Density and speed of sound of binary ionic liquid and ketone mixtures can also be predicted by ANNs . In this case, the artificial neural network models used as input variables, the temperature and the mole fraction, among others, to determine these two variables. The models developed presented an overall average percentage error lower than 2.5%, so the authors concluded that this model was applicable for the prediction of these variables in binary ionic liquid and ketone mixtures.
Nevertheless, the use of artificial neural networks is not only limited to the prediction of the previous properties, ANNs can also be used to tensammetric analysis of different nonionic surfactants (Brij 30, 35, 56 and 96) . Authors concluded that ANNs can be a possible candidate to determine nonionic surfactants. Another interesting study is the one developed by Jha et al.  that developed a feedforward artificial neural network with three layers to predict the diffusion coefficient of a micellar system with sodium dodecyl sulfate (SDS). The model uses the temperature and NaCl and SDS concentrations as input variables. The ANN is capable to model the experimental behavior (correlation coefficient upper than 0.99) and it is concluded that the model is usable to calculate this property. ANN models can also be used to investigate the different factors that affect particle size in a Nanoemulsion System (Virgin Coconut Oil) that contain copper peptide . The model used, to predict the particle size, four input variables composed of the amount of virgin coconut oil, Tween 80:Pluronic F68, xanthan gum and water. The ANN demonstrated its ability to model the particle size according to the four input variables and showed good determination coefficients upper than 0.97. Finally, another interesting research is that carried out by Rocabruno-Valdés et al.  in which the authors develop artificial neural models to predict different properties (dynamic viscosity, density and cetane number of biodiesel) using as input variables the temperature, the number of carbon and hydrogen atoms and methyl esters composition. The correlation coefficients obtained were upper than 0.91. According to the authors, the ANN models provide an adequate prediction and can be interesting for their inclusion in simulators.
To carry out this work, the experimental data obtained by Gomez et al. [1, 12] were used. The used surfactants were: hexyl trimethyl ammonium bromide (HTABr), octyl trimethyl ammonium bromide (OTABr) and decyl trimethyl ammonium bromide (DTABr) from  and tetradecyl trimethyl ammonium bromide (TDTABr) and octadecyl trimethyl ammonium bromide (ODTABr) from . All these reagents were supplied by Fluka with a purity ≥98%. [1, 12]. The authors prepared the aqueous solutions by mass using an analytical balance Kern 770 (precision 10−4 g) [1, 12].
The output variables for each aqueous solution were determined (at 298 K) with different instruments: i-ii) the density and speed of sound using an Anton Paar DSA 5000 vibrating-tube densimeter and sound analyzer, iii) the kinematic viscosity by means the transit time for liquid meniscus through a capillary viscosimeter (supplied by Schott) and iv) the surface tension using a tensiometer Krüss K-11 using the Wilhelmy plate method [1, 12].
2.3 Modeling procedure for predictive models
The surface model, which is used to evaluate the influence of each input variable on the physical properties (density, speed of sound, kinematic viscosity and surface tension) used the combination of input variables linearly, quadratically and cross-correlated . That it is, experimental data can be approximate using a generalized second-order polynomial model (Eq. (1) ). In this sense, the model was used to correlated each dependent variable (
The response surface methodology was created to carry out the experiments with previous analysis of the relationship between the variables (generally standardized), with a homogeneous distribution of the experiments . Nevertheless, in this case, the experimental data used are not homogeneously distributed and the data have not been standardized .
The other predictive model used is based on artificial neural networks. The ANN require to split the data into at least two different groups -training (T) and validation (V)-, which has been carried out by the authors  randomly. The set of training data was used to train the ANN model, while the validation data set is used to check the good training of the model . An important aspect of this methodology is that it based on the trial-error procedure  to find the optimal combination of parameters for prediction. Once the database is presented to the input layer, the training can start, the data are propagated to the first intermediate layer and the information is treated by the propagation function (Eq. (2)) to obtain a single value (
2.4 ANN’s parameters
The authors  used a total of 80 cases to develop different prediction models (RS and ANN). In this case, the database was divided into two groups. A first group, with 75% of the cases (60), to train the model and a second group, with the remaining 25%, to validate the model (20) .
The learning rate and momentum values were set at 0.7 and 0.8, respectively. The models were developed at different training cycles in order to locate the point from which could be overtrained.
2.5 Adjustments parameters
The results were analyzed by the authors  using different statistics to determine the adjustment power, such as the coefficient of determination (R2), the root mean square error (RMSE) (Eq. (4)) or the average absolute percentage deviation (AAPD) (Eq. (5)) for the training and the validation phases. Individual percentage deviations (IPD) is also used.
2.6 Computer equipment and software
The input variables, necessary to determine the desired variables, were obtained from the Sigma Aldrich and Chemdraw Professional 15 trial (PerkinElmer) . Microsoft Excel Professional Plus 2013 (Microsoft) was used for RS modeling, and the software EasyNN plus v14.0d (Neural Planner Software Ltd.) was used to ANN modeling . A computer server with an Intel® processor Core™ i7 processor with 16 GB of RAM was used to develop the models .
The figures of this book chapter were made with Inkscape 0.92 and Microsoft PowerPoint Professional Plus 2016 (Microsoft).
3. Results and discussion
|Training phase||Validation phase|
Response surface models present good determination coefficients in the training phase, varying between the value obtained for the density model (0.994) and the value obtained for the kinematic viscosity model (0.906). These good values contrast with the value obtained for the surface tension model which reports a low determination coefficient value (0.505).
For the first three models (density, speed of sound and kinematic viscosity) the values of determination coefficient obtained in the validation phase are similar (with a minimal descent to the obtained R2 values in the training phase) varying between the value obtained for the density model (0.985) and the R2 value obtained for the kinematic viscosity model (0.885). The response surface model, with the worst-performing behavior for the training phase, the model developed to predict surface tension, showed, for the validation phase, a determination coefficient of 0.503, similar to that obtained in the training phase (0.505).
Regarding the root mean square error values obtained by the response surface models developed by Astray & Mejuto , it can be seen that the density model present an RMSE value around 0.001 g·cm−3, in both phases, the speed of sound models around 7.1837 m·s−1 and 6.3226 m·s−1 in training and validation phase, respectively. The model developed to predict kinematic viscosity presents an RMSE value around 0.1003 mm2·s−1 for the training phase and 0.0569 mm2·s−1 for validation phase, and the worst model developed, the surface tension model, 8.3304 mN·m−1 and 8.1307 mN·m−1, for training and validation phase, respectively. The size of these errors can best be understood if they are given in terms of average absolute percentage deviation. The AAPD values reported for each phase are very similar. In this case, the errors obtained for each model (for all data) were: 0.08%, 0.31%, 5.18% and 14.73%, for density, speed of sound, kinematic viscosity and surface tension model, respectively. It can be seen how the AAPD value for the density and speed of sound prediction models are very low, the error of the kinematic viscosity model presents an error of 5.18% that can be considered feasible. In these cases, the error that is not acceptable is the one reported by the surface tension model (14.73%) since it is clearly much higher than the rest, and above the 10% which is considered, in our laboratory, as an acceptable error.
With all this, it can be said that the models designed to determine the density, the speed of sound and the kinematic viscosity are useful models for the prediction of these properties. The model to predict the surface tension should not be used due to its high APPD.
The adjustments for the ANN models developed  can be shown in Table 1. ANN models were developed based on the trial-error method to obtain the best models for each predict output variable (more than 400 neural networks were developed) . All models developed by the authors  presented a different topology: i) 3–7-1 for the density model, ii) 3–5-1 for the speed of sound model, iii) 3–6-1 to the kinematic viscosity model and iv) 3–1-1 for the surface tension model. Thus, each model presents, in the input layer, three variables: concentration, number of carbons and molecular weight and intermediate layer of each model varies from a single neuron, to predict the surface tension, to seven in the density model, in addition to that, each selected model has a different number of training cycles .
It can be observed (Table 1) that, in general, the ANN provided by the authors  adjust, properly, the desired variables, both in the training and in the validation phase. The model to predict the density value is the model with the best adjustments, in fact, and take into account the adjustments in terms of determination coefficient and root mean square error, this model presents values of 0.999 and 0.0004 g·cm−3, respectively, for the training phase and values of 0.999 and 0.0003 g·cm−3, respectively, for validation phase. Once again, as was RS models case, the model to predict the density is the model with the best adjustments, in fact, the AAPD values reported for both phases were 0.02%.
The behavior of the rest of the models follows the pattern of the RS models, that is, the models to predict the speed of sound and the kinematic viscosity are, in this order, the second and the third-best model .
The model destined to predict the speed of sound presents adjustments, in terms of coefficient of determination, very close to the model destined to predict density (0.998 in both phases), presenting relatively low RMSE values (1.9393 m·s−1 and 1.7093 m·s−1).
The kinematic viscosity model has slightly lower adjustment than the previous two models. In this sense, and always in terms of the determination coefficient, the value for the training phase remains similar to the two previous models, however, for the validation phase, this value falls slightly to 0.994. Even so, the model seems to be predicting the kinematic viscosity values correctly, especially if it be taking into account the low RMSE values (0.0108 mm2·s−1 and 0.0104 mm2·s−1, for training and validation, respectively) .
Finally, the worst model developed using artificial neural networks is the model designed to determine surface tension . It can be seen in Table 1 show the values obtained fall significantly, in fact, the determination coefficient value falls to 0.449 and 0.457 for the training and validation phase, respectively. It seems clear that this low value of determination coefficient indicates the impossibility of the model to make correct predictions. This fact is demonstrated with the high RMSE values obtained for the training and validation phase (9.6859 mN·m−1 and 9.6827 mN·m−1, respectively).
As stated above, the size of the errors made by the different ANN models can best be understood in terms of AAPD. In this case, the errors obtained (for all data) by density, speed of sound, kinematic viscosity and surface tension model were: 0.02%, 0.10%, 0.62% and 18.13%, respectively.
In the same way that occurs with the surface models, the ANN surface tension model should not be used to predict surface tension (APPD above 10%). The other three models can be used for prediction.
3.1 Comparison of response surface and neural models
Once the models have been analyzed separately, it is necessary to make a comparison between them.
As previously stated, the models to predict density are the best models according to the adjustments. This means that this model is useful to predict physical properties of surfactants aqueous solutions (at least with the surfactants studied).
On the one hand, although in general, the AAPD in the RS
The second-best models, based on their adjustments, are the models to predict the speed of sound. The RS
The third-best model according to its results is the model to predict the kinematic viscosity. In this case, the behavior of the RS
Finally, from all models developed, both surface tension models were the worst models according to their adjustments. These models are the models with the highest dispersion (RS
Due to these poor results, the authors  proposed an alternative ANN model (ANN’
Given the results obtained by the surface models and the neural models , it can be concluded that the models developed to determine density, sound speed and kinematic viscosity are models suitable for their use in the laboratory due to the low APPD values that presented (between 0.02% and 5.18%, for all the data cases). Regarding the models for surface tension prediction, as previously mentioned, these cannot be used for laboratory use, because they present errors upper than 10%. The alternative ANN model developed by the authors , appears to offer acceptable results in terms of determination coefficient and AAPD value. This alternative model improves the original RS and ANN model.
All the models developed  can be improved in different ways. The response surface models could be improved by adapting the experimental cases to an experimental design before the experimental measurements, allowing on the one hand to save economic costs and time, and on the other, favoring the development of an RS model based on a precise experimental design. It would also be very convenient to develop a response surface model trying to find surfactants that allow the variables to vary constantly in a range. All these improvements could favor the improvement of the models destined to predict the density, the speed of the sound and the kinematic viscosity.
Likewise, and given the ANN model that uses six input variables , it would be interesting to develop an RS model that includes the predictions of density, speed of sound and kinematic viscosity as input variables of the model (although it would be necessary to see how to treat the different variation of the values within the range understudy).
Neural network models could be improved by including different input variables that are capable of better identification of the different surfactants. Another interesting approach could by the increase the database for their modeling.
The development of models based on response surfaces and neural networks to predict different physical properties of surfactants aqueous solutions (i) density, ii) speed of sound, iii) kinematic viscosity and iv) surface tension) can be a good alternative to save money and time in the laboratory.
In general terms, this kind of models can adjust, with accuracy, the density, the kinematic viscosity and the speed of sound with determination coefficient upper than 0.902 and lower APPD values than 5.20% (for all data). In contrast to these good adjustments, surface tension models do not work properly and presented (for all data) low determination coefficients (0.503 and 0.451 for RS and ANN model, respectively) and high APPD values (14.73% and 18.13% for RS and ANN model, respectively). It seems that this problem can be solved, in the case of models based on neural networks, with the inclusion of new variables from the predictions of the previous models. With this modification, the new neural model improves (for all data) each adjustment parameter (0.974 and 2.92% for determination coefficient and AAPD value, respectively).
In conclusion, RS and ANN models can be powerful prediction tools for the properties (density, speed of sound, kinematic viscosity or surface tension) of surfactants aqueous solutions. These models could therefore facilitate daily laboratory work, saving time and money. However, it would be interesting to improve the models using other development alternatives or, even, improve these model using different approaches such as support vector machines or random forests, among others.
Gonzalo Astray thanks to the University of Vigo for his contract supported by