3 Thermooxidative Properties of Biodiesels and Other Biological Fuels

The aim of this chapter is to show how thermooxidative properties of biological fuels can be evaluated by pressure differential scanning calorimetry (PDSC) and used to correctly classify the fuels studied. The onset oxidation temperature (OOT) is an important parameter for estimating the oxidation stability that can be evaluated by the ASTM method. Nevertheless, in addition to the OOT, other meaningful information can be extracted from the PDSC tests. That additional information provides a better understanding of the thermooxidative process, allowing for identifying subtle differences between similar fuels. In fact, the following lines show that the features extracted from heat flow curves obtained by PDSC allow to characterize and to differentiate each type of fuel respect to the other ones if the adequate statistical tools are applied. Thus, the proposed statistical analysis of the PDSC curves allows to classify the different fuels types chosen for this study: two types of biodiesel, seven different classes of edible oils and two wood species. The statistical study consisted of the application of Analysis of Variance (ANOVA) procedures and the implementation of a simulation study, using parametric bootstrap and methods of multivariate supervised classification as Linear Discriminant Analysis (LDA), Logistic Regression and Naïve Bayes classifier. Studying the thermooxidative properties of a fuel is important attending to various reasons. For example, vegetable oils are protected against oxidation thanks to antioxidants that precisely removed during the production process of biodiesel. For this reason, biodiesel is not stable, being susceptible to oxidation to a greater or lesser extent due to several factors including the presence of air, temperature, light, presence of hydroperoxides and antioxidants (Dunn, 2005; Knothe & Dunn, 2003; Knothe, 2007). The products resulting from the oxidation of biodiesel can damage internal combustion engines, it is therefore essential to study the oxidation stability of biodiesels. In the case of vegetable oils, they can produce significant changes in the salubriousness of food when the same oil is used repeatedly to fry due to the possible oxidation processes produced at the relatively high temperatures (Vorria et al., 2004). For this reason, the thermal stability to oxidation is an important parameter for oils. The study of thermooxidative characteristics of the species of wood is not as common as


Introduction
The aim of this chapter is to show how thermooxidative properties of biological fuels can be evaluated by pressure differential scanning calorimetry (PDSC) and used to correctly classify the fuels studied.The onset oxidation temperature (OOT) is an important parameter for estimating the oxidation stability that can be evaluated by the ASTM method.Nevertheless, in addition to the OOT, other meaningful information can be extracted from the PDSC tests.That additional information provides a better understanding of the thermooxidative process, allowing for identifying subtle differences between similar fuels.In fact, the following lines show that the features extracted from heat flow curves obtained by PDSC allow to characterize and to differentiate each type of fuel respect to the other ones if the adequate statistical tools are applied.Thus, the proposed statistical analysis of the PDSC curves allows to classify the different fuels types chosen for this study: two types of biodiesel, seven different classes of edible oils and two wood species.The statistical study consisted of the application of Analysis of Variance (ANOVA) procedures and the implementation of a simulation study, using parametric bootstrap and methods of multivariate supervised classification as Linear Discriminant Analysis (LDA), Logistic Regression and Naïve Bayes classifier.Studying the thermooxidative properties of a fuel is important attending to various reasons.For example, vegetable oils are protected against oxidation thanks to antioxidants that precisely removed during the production process of biodiesel.For this reason, biodiesel is not stable, being susceptible to oxidation to a greater or lesser extent due to several factors including the presence of air, temperature, light, presence of hydroperoxides and antioxidants (Dunn, 2005;Knothe & Dunn, 2003;Knothe, 2007).The products resulting from the oxidation of biodiesel can damage internal combustion engines, it is therefore essential to study the oxidation stability of biodiesels.In the case of vegetable oils, they can produce significant changes in the salubriousness of food when the same oil is used repeatedly to fry due to the possible oxidation processes produced at the relatively high temperatures (Vorria et al., 2004).For this reason, the thermal stability to oxidation is an important parameter for oils.The study of thermooxidative characteristics of the species of wood is not as common as in oils or biodiesel.However, this is justified as it would allow to estimate the resistance to combustion in an oxidizing atmosphere, under similar conditions to wildfire.The thermal analysis techniques used to measure thermooxidative stability are: thermogravimetry (TG), differential scanning calorimetry (DSC) and PDSC.The oxidation stability can be obtained using the TG technique (a) measuring the increase in sample weight due to absorption of oxygen, (b) measuring the temperature corresponding to maximum weight and (c) the temperature at the beginning of oxidation (Van Aardt et al., 2004).DSC and PDSC techniques can be applied to study the exothermic oxidation process.
The PDSC provides results in a shorter time than DSC, further reducing evaporation in the sample.It is also important to note that using PDSC we can estimate the oxidation stability under pressures similar to those operating in a diesel engine.The values that are determined to study the oxidation stability by DSC and PDSC are the oxidation induction time (OIT) and the the onset oxidation temperature (OOT).High values of both parameters are related to a high oxidative stability.The two methods have been used by several authors to study the oxidation stability of biodiesel (Knothe, 2007;Moser et al., 2007;Dunn, 2006;Xu et al., 2007, Polavka et al., 2005), findding correlations with other procedures (Dunn, 2005;Tan, 2002).The OOT parameter measures the degree of oxidative stability of a substance at a constant heating rate, both at high pressure and high temperature.It is a non isothermal dynamic method.The procedure for calculating the OOT is explained in ASTM E2009 (2008).Recent results concerning the characterization of thermooxidative fuels such as biodiesel or edible oils can be found in (Tarrío-Saavedra et al., 2010;Artiaga et al., 2010;López-Beceiro et al., 2011).

Materials
In the present chapter, three different types of fuels are tested: 1.Two types of biodiesel: obtained from the soybeam and from the palm.2. Four classes of vegetable oils: soy, sunflower, corn and two olive oil spanish varieties named hojiblanca and picual.3. Two species of comercial wood: Pinus sylvestris (Scots pine) and Eucalyptus globulus.

Biodiesel
Biodiesel is a liquid biofuel made from natural fats such as vegetable oils or animal fats through a process of esterification and transesterification.The resulting substance of these transformations can be applied as a partial replacement of petroleum products.The reaction of the base oils with a low molecular weight alcohol and a catalyst (usually sodium hydroxide), resulting in fatty acids formed by long chains of mono-alkyl esters which are very similar to "diesel " derived from petroleum.The commercial biodiesel used today are mixed with other fuels.In this paper we have studied two types of pure biodiesel, obtained from the soybeans and, on the other hand, from palm oil.They have been supplied by Entaban Biofuels Galicia, SA (Ferrol, Spain).See Table 1.

Data collecting
A design of experiments consisting of 1 factor (type of fuel) at 9 different levels (soy biodiesel, palm biodiesel, sunflower oil, soy oil, corn oil, hojiblanca olive oil, picual olive oil, eucalyptus wood and Scots pine) was done.Three samples per each fuel type were considered, capturing the existing variability.In fact, this sampling process seeks to obtain a compromise between capturing the existing variability and the minimization of the time of the experimental test.The tests are carried out by PDSC to study the oxidation stability of the fuels and to compare these materials according to this concept.The PDSC tests were performed in a TA Instruments pressure cell mounted on a Q2000 modulated DSC.The experimental conditions were the following: T-zero open aluminum pan, a heating rate of 10 ºC min -1 from room temperature to 300 ºC -taking into account the recommendations to obtain a better oxidation peak (Riesen & Schawe, 2006)-, sample mass in the 3-3.30mg range, and an oxygen pressure of 3.5 MPa, applying a flow rate of 50 mL min -1 according to the ASTME2009 method.The experiments were manually stopped once the end of the exotherm was reached.
The Universal Analysis software supplied by the company TA was utilized to calculate the OOT using the standard E2009.The standard determines that the OOT corresponds to the temperature assigned to the crossover point between the tangent to the curve of heat flow at the point of maximum slope and tangent to the curve just before the occurrence of the peak corresponding to oxidation (which coincides with the baseline).

Analysis of variance (ANOVA)
The analysis of variance is a statistical tool performed to study the dependence of a quantitative variable with respect to one or more qualitative variables.In this chapter, an experimental design consisting of an nine-level factor was performed.The quantitative variable is the parameter OOT, an indicator of oxidation stability, while the factor is the type of fuel.The F test allows testing whether the mean OOT values for each fuel type are statistically equal or, conversely, there are at least one mean different (Maxwell, 2004).(Maxwell, 2004).The significance level used in this work is 0.05.

Classification methods
The process of assigning a p-dimensional observation to one of several groups predetermined is called supervised classification.The principal aim is to obtain a discriminant function that summarizes the information corresponding to the different p variables that define a sample according to an indicator, with which each observation can be correctly classified as belonging to a group.In the statistical literature can be found several methods developed to address the classification problem.

Linear discriminant analysis
One of the most popular techniques in classification was proposed by Fisher (Fisher, 1936), this approach is called linear discriminant analysis (LDA) and basically divides the sample space into subspaces through the use of hyperplanes that allow to better separating the groups studied.The assumptions for the use of LDA are: multivariate normality and equal covariance matrices between groups.Under these assumptions, the LDA is based on finding a linear combination of features that describe or separate two or more classes of objects or events.The resulting combination can be used as a linear classifier, or more commonly, to reduce the dimension of the problem before a subsequent classification.
LDA is closely related to other statistical techniques such as analysis of variance (ANOVA) and regression analysis, however, in these two techniques, the dependent variable is a number, while in LDA is a categorical variable (class labels).Other statistical procedures related to LDA are the Principal Component Analysis (PCA) and Factor Analysis (FA), used when you look for linear combinations of variables that better explain the data.
Although the terms LDA and Fisher linear discriminant analysis are commonly used to indicate the same procedure of supervised classification, in fact, the early work of Fisher (Fisher, 1936) does not imply the assumptions of normality and equal covariances, undertaken by LDA.
The LDA has been successfully applied in fields as diverse as engineering, economics, computing science, biology, etc.Recently, the LDA has been applied in some works related to the classification of weeds (Lopez-Granados et al, 2008) and the classification of different species of wood through the use of features extracted by image processing (Mallik et al., 2011).

Logistic regression
The Logistic model is currently applied to these cases where the explicative variables (or set of different features) do not have a multivariate normal distribution (McLachlan, 2004).
Considering only two classes (C1 and C2), the Logistic regression equation used to solve this classification problem is the following 1 : where p = P(Y = C1|x) is the posterior probability of Y equal to C1 class, log (p/(1-p)) = + 'x is the logit transformation, x is the p-dimensional vector of features or explanatory variables, is a vector of p parameters and p/(1−p) the odds ratio.Nevertheless, in the present study is necessary to use a classification model that could be applied in the case of the existence of multiples classes.This can be solved using the logit model generalized to more than two populations, i.e. for qualitative response with more than two possible classes.If G population are supposed, then, defining p as the probability that the observation i belongs to the class g, it is possible to write 2 : The logistic regression has been applied for classifying species of wood through the use of features extracted by image segmaentation (Mallik et al., 2011).

Bayes Naïve classifier
Naïve Bayes classifier is a supervised multivariate classification technique based on Bayes theorem, particularly suitable when the dimension of the vectors of features or inputs is considerably high.Calculating the posterior probability for an event among a group of possible outputs, X = {x 1 ,x 2 ,...,x d }, is intended.That is, using Bayes rule we intend to calculate the probability that a sample belongs to a particular class, C j , from a group of possible classes C = {c 1 ,c 2 ,...,c k }, given some particular values corresponding to the characteristics that define the sample.Using Bayes rule, the probability that X belongs to C j or posterior probability is 4 : 12 12 ( | , ,..., ) ( , ,..., | ) ( ) Using Bayes' rule, we estimate the class of the event or sample using the class corresponding to the largest posterior probability obtained.Since Naïve Bayes assumes that the conditional probabilities of the independent variables are statistically independent, the posterior probabilities can be rewrite as 5 : In addition, due to the assumption that the predictor variables are statistically independent, we can reduce the size of the estimated density function using a kernel estimation consisting of one dimension.The Naïve Bayes classifier can be modeled with normal, log-normal, Gamma and Poisson density functions.
Naïve Bayes method appears in the 80's and is the supervised classification method most popular based on the Bayes rule.Several variants and extensions of the Naïve Bayesian classifier have been developed, for example, Cestnik (Cestnik, 1990) developed the mestimations of the posterior probabilities and Kononenko (Kononenko, 1991) designed a semi-naïve Bayesian classifier that goes beyond the "naive" and detects dependencies between attributes.The advantage of fuzzy discretization of continuous attributes in the Naïve Bayesian classifier is described in the work of Kononenko (Kononenko, 1992).Langley (Langley, 1993) studied a system that uses the Naïve Bayesian classifier at the nodes of decision trees.Other recent works are those for Webb et al. (Webb et al, 2005) and Mozina et al. (Mozina et al., 2004).This technique has been used successfully in classification problems of spam and in areas such as medicine (to resolve, among other tasks, medical diagnosis), acoustic (automatic classification of sound and voice), image classification (Kononenko, 2001;Tóth et al., 2005;Mallik et al., 2011).

K nearest neighbors (KNN)
K Nearest Neighbors (KNN) is a non-parametric supervised classification method, which has been used successfully in populations where the assumption of normality is not verified.This assumption is required by traditional techniques such as linear discriminant analysis.We can summarize the KNN operation in the following three points: 1.A distance is defined between samples (represented by feature vectors), usually the Euclidean or Mahalanobis distances.2. The distances between the test sample, x 0 , and the other samples are calculated.3. The k nearest samples to those that we want to classify are selected.Then, the proportion of these k samples belonging to each of the studied populations is calculated.Finally, the sample x 0 is classified within the population corresponding to the highest existing frequency.Among the different methods available for choosing the value of k, the minimization of error of cross validation is one of the most used.KNN method was introduced by Fix and Hodges (Fix & Hodges, 1951).Later, he shown some of the formal properties of this procedure, for example, that the classification error rate is bounded by twice the Bayes error value when you have an infinite number of samples for classifying and k is equal to 1 (Cover & Hart, 1967).Once developed the formal properties of this classifier, he established a line of research that goes up today, highlighting the work of Hellman (Hellman, 1970), which show a new approach to rejection, Fukunaga and Hostetler (Fukunaga & Hostetler, 1975), which sets out refinements with respect to the Bayes error rate, or those developed by (Dudani, 1976) and Bailey and Jain (Bailey & Jain, 1978), in which new approaches were established to the use of weighted distances.Other interesting work on the subject is related to soft computing (Bermejo & Cabestany, 2000) and fuzzy methods (Jozwik, 1983, Keller et al., 1985).Recent interesting papers are those of Bremner et al. (Bremner et al., 2005), Nigsch et al. (Nigsch et al., 2006), Hall et al. (Hall et al., 2008) and Toussaint (Toussaint, 2005).They are also very interesting applications of this algorithm to the analysis of functional data (Ferraty & Vieu, 2006).The development of computer tools in recent years and the creation of the information society have led to that the technique KNN be used successfully in such diverse fields as chemistry, biology, medicine, computer science, genetics and materials science (Tarrío-Saavedra et al., 2011;Mallik et al, 2011).

Validation procedure: Leave-one-out cross validation
When we want to classify samples using supervised classification methods, working with training and testing data, extracted from the observed instances, is necessary.Each instance in the training set consists of the corresponding class label and a vector of several sample features.The aim of the classification methods applied is to produce a model, using the training sample, to estimate the class labels corresponding to each data instances corresponding to the testing set for which we only know the features.Leave-one-out crossvalidation is the procedure used to obtain the probabilities of correct classification for each test sample and, therefore for comparing the different classification methods proposed.This is a technique widely used for the validation of an empirical model, especially suitable for working with small samples sizes.This procedure consists on the following steps: 1.One instance is leaving out: the testing sample.2.Then, a model is obtained using the remaining samples (the training sample).3. Finally, the developed model is used for classifying the left out instance.This sequence is repeated until all the instances are left out once.The percentages (measured as per one) of correct classification are obtained using this procedure.All the classification methods have been implemented using R statistical package

Parametric bootstrap resampling
In the case of the parametric bootstrap, the model from which data was generated of the original sample is known or assumed, ie the type of distribution is known.Therefore, successive resamplings are obtained by substituting the parameters of the distribution of probability corresponding to the studied variables by the maximum likelihood estimators, calculated from the original sample.In the present chapter the normal distribution of the features is assumed.In addition, we suppose a model where the observations (the chosen features) are independent, i.e. a diagonal covariance matrix is assumed.Taking into account these assumptions, knowing the mechanism that generates the data, generating new data from the parameters of the original sample is possible using the sample means and variances.This allows to do a simulation study to evaluate the discrimination power of the heat flow PDSC curves and their extracted features.

Results
The PDSC curves obtained using ASTME2009 are shown in Fig. 1.They represent the heat flow vs. temperature signals corresponding to the 9 different fuels, obtained using a heating ramp.The Fig. 1 shows that the curves are different at lesser or major extend depending on the class of fuel tested.At a first glance, it seems there are two main groups.The first group is corresponding to the different studied oils and biodiesel types, and the second one consists of the wood species.It is clear that the OOT values corresponding to the first group are significantly lower than the OOT ones of the second group.But there are differences in oxidation stability (measured by OOT parameter, according to the Fig. 2) within these main groups?For answering this question we have used well known statistical tools as the F test and Tukey test.Using F test we can confirm that at least one class of fuel presents different OOT mean value than the others with statistical significance (p-value ≈ 0 < 0.05).By means the Tukey test we can know which fuels are statistically different, observing the OOT variable.Table 3 shows the result of Tukey test.Each column represents a group of fuels different from the others, on the basis of the OOT value.For example, in the group number 1 there are three species that present no different OOT values (p-value = 0.154 < 0.05).We can observe that there are not differences between soy and soy biodiesel OOT.However, they are different to all the olive oil varieties tested and to the wood fuels studied.In fact, olive varieties form an independent group.On the other hand, palm biodiesel OOT is statistically different from the sunflower, picual and wood species OOT.It is important to note that the high OOT values obtained for wood species may condition the results for the remaining fuel classes.Attending to the means, the following fuels are sorted from largest to smallest OOT: Foccusing to the soy and soy biodiesel OOT values, the OOT mean corresponding to soy oil is higher than soy biodiesel OOT, according to the theory.But when we want to compare an important quantity of fuels that presents a wide range of OOT values, the variance of the OOT measurements can prevent to distinguish the different fuels.The OOT is an important parameter that contains much information about the oxidation stability of a fuel.But, as we have observed, the OOT by itself is not enough to distinguish between all studied fuels.
Obtaining more information about the PDSC curves is necessary to classify correctly among the different fuels.Therefore, additional features are chosen: the maximum slope of heat flow versus time (slope max, V) obtained in each case, the temperature at that point of maximum slope (T at max slope, H) and the slopes of the heat flow curves vs. temperature in the range from 5 to 10 Wg -1 (slope between 5 and 10, m).The Fig. 2 and 3 show the additional features extracted from the PDSC signals.
Moreover, having a large number of samples in a supervised classification problem is recommended.There are three samples of each fuel but, as shown in Fig. 1, the PDSC 57 Fig. 2. OOT, H and V features extracted from the PDSC heat flow signal and its derivative.Then, a simulation study is presented to evaluate the power of classification of the chosen features.A parametric bootstrap resampling is chosen to increase the sample size until 100 items per fuel class.The parametric bootstrap is implemented for generating new values from OOT and the other chosen features, assuming that are independently distributed according a Gaussian distribution where the mean is the sample mean and the variance is the sample variance.The leave-one-out cross-validation method technique is used for the validation of the empirical model.It allow to estimate the probabilities of correct classification corresponding to the different classification methods.It works by leaving out one sample (represented by the features above mentioned); then a model is trained with the remaining parameter samples and, finally, the developed model is used for classifying the sample left out.This is repeated 900 times, until all the vectors have been left out once.Table 4 shows the probability of correct classification obtained by the above mentioned classification methods.These probabilities are very high, regardless of the method used.The best result corresponds to the use of logistic regression (99.7%) through almost all the samples are correctly classified.Table 5 shows the confusion matrices corresponding to the application of logistic regression, LDA, Bayes Naïve and KNN classification methods.The percentage of simulated samples correctly classified is shown in the diagonal of the matrices.The percentages of confusion obtained between the fuel types, two by two, are presented outside the diagonal.The little confussions existing between the two types of wood and between palm biodiesel and hojiblanca olive oil are solved using the logistic regresion method.According to these results, the OOT and the other characteristics are very useful parameters for classification purposes.

Conclusion
The thermooxidative stability of 9 different types of fuels (including two types of biodiesel, soy and palm oil) has been measured using the OOT parameter.The use of the OOT parameter and ANOVA techniques allows to differentiate various groups of fuels: the varieties of olive oil, the two types of wood and finally the remaining fuels (although the sunflower oil is slightly different).But the OOT by itself is not enough to distinguish between all studied fuels with statistical significance.The classification of the 9 fuels according to the thermooxidative properties has been possible using multivariate supervised classification method and additional features extracted from the PDSC curves as dataset: the maximum slope of heat flow versus time (slope max) obtained in each case, the temperature at that point of maximum slope (T at max slope) and the slopes of the heat flow curves vs. temperature in the range from 5 to 10 Wg -1 (slope between 5 and 10).
That additional information provides a better understanding of the thermooxidative process, allowing for identifying subtle differences between similar fuels.The evaluation of the discriminant power of the extracted thermooxidative features has been possible using parametric bootstrap resampling.
an estimator of the posterior probabilities, i.e. the probabilities of belonging to a specific class, given the values of a vector of features (values of x).The p ig, or posterior probabilities, satisfy a multivariate logistic distribution.The following expression is used to do the different possible comparisons 3 :

Fig. 3 .
Fig. 3.The slopes corresponding to the heat flow curves vs. temperature in the range from 5 to 10 Wg -1 (slope between 5 and 10, m).
Confusion matrix or prediction percentages obtained by each classification method and leave-one-out cross-validation, using the features extracted from PDSC signals.The feature data set was tested with 9 classes or types of fuels.The results are shown as percentages.

Table 2
. Chemical composition of sunflower, soybean, corn and olive oil, retrieved from the USDA National Nutrient Database for StandardReference-22 (USDA, 2009).

Table 3 .
Mean values of OOT in the homogeneous subsets for the fuel class factor.Different groups obtained by applying the Tukey test (with significance lever α = 0.05).

Table 4 .
Percentages of correct classification obtained by the three proposed methods.The best results are obtained by Logistic regression.