Summary of results of modeling and validation.

## 1. Introduction

This study aims to develop and validate multivariate mathematical models in order to monitor in real time the quality processing of derivatives in an oil refinery.

Methods heavily based on statistical and artificial intelligence as multivariate or chemometric methods have been widely used in the oil industry (KIM; LEE, KIM, 2009). Several articles have been written about applications of multivariate analysis to predict properties of oil derivatives (Santos Junior et al., 2005; Chung, 2007).

Pasadakis, Sourligas and Foteinopoulos (2006) have used the first six principal components of Principal Component Analysis (PCA) as input variables in nonlinear modeling of oil properties.

Pasquini and Bueno (2007) have proposed a new approach to predict the true boiling point of oil and its degree API (American Petroleum Institute) - a measure of the relative density of liquids by Partial Least Squares (PLS) and Artificial Neural Networks (ANN). Samples of mixtures oil were obtained from various producing regions of Brazil and abroad. In this application, the models obtained by the PLS method were superior to neural networks. The short time required for prediction the properties justifies the proposed of characterization the oil quicker to monitor refining processes.

Teixeira et al. (2008) in work with Brazilian gasoline used the multivariate algorithm Soft Independent Modeling of Class Analogy (SIMCA) for clusters analysis. Aiming to quantify the amount of adulteration of gasoline by other hydrocarbons, the PLS method was applied. Finally, the models were validated internally by cross-validation algorithm and externally with an independent set of samples.

Bao and Dai (2009) studied different multivariate methods, including linear and nonlinear techniques in order to minimize the error of prediction by models developed for quality control of gasoline. Lira et al. (2010) applied the PLS method for inference of the quality parameters: density, sulfur concentration and distillation temperatures of the mixture diesel / bio-diesel, providing great savings in time compared with the traditional methods by laboratory equipment.

Aleme, Corgozinho and Barbeira (2010) have conducted a study of classification of samples using the PCA method for discrimination of diesel oil type and the prediction of their origin.

Paiva Ferreira and Balestrassi (2007) have combined the Response Surface Method (RSM) of Design of Experiments (DOE) with Principal Component Analysis in optimizing multiple correlated responses in a manufacturing process.

Huang, Hsu and Liu (2009) have used Mahalanobis-Taguchi integrated with Artificial Neural Networks in data mining to look for patterns and modeling in manufacturing. Pal and Maiti (2010) have adopted the Mahalanobis-Taguchi algorithm to reduce the dimensionality of multivariate data and for optimization with Metaheuristics in the sequence.

Liu et al. (2007) have made inferences about quality parameters of jet fuel using Multiple Linear Regression (MLR) and ANN. The work showed that the performance of modeling by ANN was superior.

In optimization of multivariate models, there are applications combined with Multivariate Analysis of Metaheuristics, such as simulated annealing (SAUNIER, et al., 2009), genetic algorithm (GA) (Roy, Roy, 2009) tabu search (QI; SHI; KONG, 2010), particle swarm (Pal; Mait, 2010), and ant colony (Goodarzi; Freitas; Jensen, 2009; Allegrini; Oliveri, 2011).

With the objective of optimizing the dimensionality of multivariate models and avoid the overfitting phenomenon in determining principal components, Xu and Liang (2001) have used the Monte Carlo Simulation on simulated data sets and two real cases. Gourvénec et al. (2003) compared Monte Carlo cross-validation with the traditional method of cross validation to determine the appropriate number of latent variables.

Adler e Yazhemsky (2010) have combined the Monte Carlo Simulation, PCA and Data Envelopment Analysis (DEA) in a context where there is a relatively large number of variables related to the number of observations for decision making. Llobet et al. (2005), by means a Multiple Criteria Decision-Making (MCDM) model, have used Fuzzy classification of samples of chips. For prediction oxidative and hydrolytic properties, was used an electronic nose based on PLS models, with prior selection of input variables by a GA Metaheuristic.

Wu, Feng and Wen (2011), in studies related to Botany, compared the performance of the growth of a tree species - Carya Cathayensis Sarg by PCA methods and Analytic Hierarchy Process (AHP), identifying the advantages and the disadvantages of each method, although the results obtained by both have been essentially identical.

Zhang et al. (2006) have combined the method Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE), from the Elimination et la Choix Traduisant Réalité (ELECTRE) and Geometrical Analysis for Interactive Assistance (GAIA) with PCA and PLS methods to classify 67 oils and determine an indicator of product quality. Purcell, O'Shea and Kokot (2007) also combined PROMETHEE and GAIA with PCA and PLS in studies related to cloning of sugarcane.

Regarding to the control charts designed to monitor the mean vector, Machado and Costa (2008) have studied the performance of T^{2} charts based on principal components for monitoring multivariate processes. Lourenço et al. (2011) have used the principles of Process Analytical Technology (PAT) in the construction of control charts based on the scores of the first principal component versus time for the on-line monitoring of pharmaceutical processes.

Moreover, Multivariate Analysis is an important technique in various areas of knowledge such as Data Mining (Kettaneh; Berglund; Wold, 2005); Econometrics (Mackay, 2006); Marketing (Ahn; Choi; Han, 2007) and Supply Chain Management (Pozo et al., 2012).

## 2. Application: Oil refining

The first process in a refinery is atmospheric distillation or direct distillation, where components of crude oil are separated into different sections using different boiling points. The main products obtained in this process are: liquefied petroleum gas (LPG), naphtha - precursor of gasoline, jet fuel, diesel and fuel oil.

Additionally, refineries usually have a second tower, vacuum distillation, to produce diesel cuts. These intermediate streams feeding a chemical process called Fluid Catalytic Cracking (FCC). In this, two noble streams are generated: LPG, and gasoline. It is a refining scheme much more flexible, but though modern, may also present difficulties for framing products stricter specifications.

The production scheme level 3 is more flexible and cost effective than the previous one, because it uses the chemical process of Coking, which transforms a fraction of lower value - vacuum residue of distillation towers, in the noblest products like LPG, gasoline, naphtha and diesel oil.

This final refining scheme incorporates the process Hydrotreating of middle fractions generated in the Coker Unit, enabling increased supply of diesel with good quality. This scheme allows a more balanced supply of gasoline and diesel oil, producing more diesel and less gasoline than the previous settings.

Of course, there are other macro-processes and auxiliary processes such as water treatment plant, effluent disposal, sulfur recovery units, units of hydrogen generation and consequently other interconnections, details of which are not subject of this work (ANP, 2012).

## 3. Methods

### 3.1. Acquisition database: Infrared radiation

In the oil industry, signs of infrared radiation generated by sensors are associated with the prediction of the quality of distillates such as naphtha, gasoline, diesel and jet fuel (Kim, Cho; Park, 2000).

Freitas et al. (2012) and Pasquini (2003) explain this instrumentation (Figure 1): the polychromatic radiation emitted by the source has a wavelength selected by a Michelson interferometer. The beam splitter has a refractive index such that approximately half of the radiation is directed to the fixed mirror and the other half is reflected, reaching the movable mirror and is therefore reflected by them. The optical path differences occur due the movement of the movable mirror that promotes wave interference.

An interferogram is obtained as a result of a graph of the signal intensity received by the detector versus the difference in optical path traveled by the beams. By calculating the Fourier Transform (FT) the interferogram can be written as a sum of sines and cosines (Tarumi et al, 2005) and in this case, happens to be called transmittance spectra, T (Forato; Filho; Colnago, 1997). Finally, the spectrum of transmittance, T, is converted to absorbance spectra, A, by co-logarithm of T (Suarez et al. May 2011). The absorbance can be interpreted as the amount of radiation that the sample absorbs and the transmittance, the fraction of radiation that the sample does not absorb. These phenomena occur depending on their chemical composition (Kramer; Small, 2007).

The chemical bonds of the type carbon-hydrogen (CH), oxygen-hydrogen (OH) and nitrogen-hydrogen (NH), present in petroleum products (Pasquini; Bueno, 2007), are responsible for the absorption of infrared radiation, however, are not very intense and overlap. The broad spectral bands formed are difficult to interpret (Skoog; Holler; Crouch, 2007) due to the phenomenon of collinearity (Naes; Martens, 1984). The origin of this phenomenon is associated with the manner in which the infrared radiation interacts with matter and can be demonstrated by Quantum Mechanics at work Pasquini (2003).

These input variables (radiation absorbed), called X_{i} are correlated, so are said collinear or multicollinear (NAES et al., 2002). To illustrate the collinearity, X is a dummy matrix a_{ij} with i rows and j in terms columns, where a_{ij} is the radiation absorption of three samples i (i = 1, 2, 3) at two wavelengths j (j = 1, 2).

The columns of X are linearly dependent, so the variables column j_{1} and j_{2} are colinear, that is, when increases j_{1}, j_{2} increases proportionally. This causes the determinant of X'X to be zero, where X' is the transpose of matrix X.

Then, the det (X'X) = (14.56) - (28.28) = 0 and this according to Naes et. al (2002) means that there is a singular error matrix and that those erros are propagated when the dependent properties, Y, are determined by regression methods which are not based on the principal components, such as the MLR.

However, the multivariate approaches such as Principal Component Regression (PCR) and PLS have been quite appropriate due to dimensionality reduction, which creates a new set of variables called principal components (Rajalahti; Kvalheim, 2011). So with data mining for Multivariate Analysis, it is possible to relate the physicochemical properties (quality characteristics) of products with the chemical composition of the sample reflected by the absorption spectra. So once modeled a property, just a sample is subjected to infrared radiation to predict their properties.

### 3.1. Acquisition data base: Reference properties

In this work were modeled properties of gasoline, diesel and jet fuel. For gasoline, the octane number and for diesel oil and jet fuel, the kinematic viscosity property.

According to Freitas (2012), kinematic viscosity of the diesel oil and jet fuel products is an important property in terms of its effect on power system and in fuel injection. Both high and low viscosities are undesirable since they can cause, among others, problems in fuel atomization. The formation of large and small droplets (low viscosity), can lead to a poor distribution of fuel and compromise the mixture air – fuel resulting in an incomplete combustion followed by loss power and greater fuel consumption.

The octane number of a gasoline is an important characteristic which is related to their ability to burn in spark-ignition engines. It is determined by comparing its tendency to detonate with the reference fuel with octane known under standard operating conditions.

When it comes to defining the octane required by engines, many countries use anti-knock index (I), defined by Equation 1:

where MON is the Motor Octane Number and RON is the Research Octane Number. The method MON measures the resistance to detonation when gasoline is being burned in the most demanding operating conditions and at higher rotations. The test is done in motors CFR (Cooperative Fuel Research), single-cylinder with variable compression ratio equipped with the necessary instrumentation in a stationary base, as shown in Figure 2.

The RON method evaluates the resistance of the gasoline to detonation under milder conditions and work in less rotation than that measured by octane number MON. The test is done in similar engines to those used for testing in MON octane.

It takes two hours and half to run the test MON and it is spent the same time for the test RON.

## 4. Results and discussion

Samples of gasoline, diesel and jet fuel, collected during 1 year, were subjected to laboratory tests, to determine the input variables, X_{i}, which are the infrared radiation absorbed, and the response variables, Y_{i}, that are physicochemical properties. The physicochemical properties will be predicted by PLS models.

The Table 1 summarizes the validation results of each model for products gasoline, diesel and jet fuel, where RMSEP (Root Mean Square Error of Prediction) corresponds to the standard deviation of the residuals (differences between measured and predicted values by the model).

The Figures 3-6 illustrate that the residues of models follow normal distribution, since in all cases the p-value was greater than 0.05.

Diesel Oil | Viscosity (40ºC) | 180 | 8 | 0.116 cSt | 0.9368 |

Gasoline | MON | 350 | 6 | 0.22 | 0.8723 |

Gasoline | RON | 350 | 7 | 0.22 | 0.9891 |

Jet Fuel | Viscosity (-20ºC) | 279 | 7 | 0.01 cSt | 0.8836 |

## 5. Conclusions

The following conclusions can be drawn from the results of this study:

It was possible to model mathematically the properties octane number and viscosity of the products gasoline, diesel and jet fuel.

The developed models were externally validated according to ASTM D-6122 and their predictions have precision equivalent to the reference methods.

The results were used in an oil refinery and contributed immensely to speed up the decision-making in blendings systems. Unlike the laboratory trials, the response time of a property along with the computational time does not exceed three minutes.

## References

- 1.
Adler N Yazhemsky E Improving discrimination in data envelopment analysis: PCA-DEA or variable reduction - 2.
AGENCY OF OIL NATURAL GAS AND BIOFUELS. Schemes Production in Oil Refining: - 3.
<http://www anp.gov.br/?pg=7854&m=esquema+de+refino&t1=&t2=esquema+de+refino&t3=&t4=&ar=0&ps=1&cachebust=1331008874709>. Accessed in 06 March2012 - 4.
Ahn H Choi E Han I Extracting underlying meaningful features and canceling noise using independent component analysis for direct marketing Expert Systems with Applications,2007 2007 33 181 191 - 5.
Aleme H. G Corgozinho C. N. C Barbeira P. J. S Diesel oil discrimination by origin and type using physicochemical properties and multivariate analysis - 6.
Bao X Dai L Partial least squares with outlier detection in spectral analysis: A tool to predict gasoline properties - 7.
Estudos de métodos de aumento de resolução de espectros de FTIR para análises de estruturas secundárias de proteínas. Química Nova,Forato L. A Filho R. B Colnago L. A 1997 1997 20 146 150 - 8.
Multivariate modeling in quality control of viscosity in fuel: an application in oil industry. Fuel Injection in Automotive Engineering. Rijeka: InTech,Freitas L. V Freitas A. P. B. R Marins F. A. S Loures C. C. A Silva M. B 2012 - 9.
Goodarzi M Freitas M Jensen R Ant colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl) uracil derivatives using MLR, PLS and SVM regressio ns. Chemometrics and Intelligent Laboratory Systems,2009 2009 98 123 129 - 10.
Rutledge.Gourvénec S Pierna J. A. F Massart D. L An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS model - 11.
The Mahalanobis-Taguchi system- Neural network algorithm for data-mining. Expert Systems with Applications,Huang C. L Hsu T. S Liu C. M 2009 2009 36 5475 5480 - 12.
Kettaneh N Berglund A Wold S PCA and PLS with very large data sets - 13.
Kim D Lee J Kim J Application of near infrared diffuse reflectance spectroscopy for on-line measurement of coal properties - 14.
Kim K Cho I Park J Use of real-time NIR (near infrared) spectroscopy for the on-line optimization of a crude distillation unit In: NPRA, Computer Conference. Chicago,2000 - 15.
Kramer K. E Small G. W Robust absorbance computations in the analysis of glucose by near-infrared spectroscopy - 16.
Lira L. F. B Vasconcelos F. V. C Pereira C. F Paim A. P. S Stragevitch L Pimentel M. F Prediction of properties of diesel/biodiesel blends by infrared spectroscopy and multivariate calibration - 17.
Liu H Yu J Xu J Fan Y Bao X Identification of key oil refining technologies for China National Petroleum Co. (CNPC) Energy Police,2007 2007 35 2635 2647 - 18.
Llobet M. V. E Brezmes J Vilanova X Correig X A fuzzy ARTMAP- and PLS-based MS e-nose for the qualitative and quantitative assessment of rancidity in crisps - 19.
Lourenço V Herdling T Reich G Menezes J. C Lochmann D Combining microwave resonance technology to multivariate data analysis as a novel PAT tool to improve process understanding in fluid bed granulation - 20.
Machado M. A. G Costa A. F. B The use of principal components and univariate charts to control multivariate processes Operational Resarch,2008 2008 28 173 196 - 21.
Chemometrics, econometrics, psychometrics- How best to handle hedonics? Expert Systems with Applications,Mackay D 2006 2006 17 529 535 - 22.
Partial Least Squares. In: Multivariate calibration and classification. Chichester. NIR Publications,Naes T Isaksson T Fearn T Davies T 2002 - 23.
Multivariate calibration II. Chemometric methods. Trends in analytical chemistry,Naes T Martens H 1984 1984 3 266 271 - 24.
Metodologia de superfície de resposta e análise de componentes principais em otimização de processos de manufatura com múltiplas respostas correlacionadas. PhD Thesis; Itajubá Federal University,Paiva A. P 2006 - 25.
Pal A Maiti J Development of a hybrid methodology for dimensionality reduction in Mahalanobis-Taguchi system using Mahalanobis distance and binary particle swarm optimization - 26.
Pal A Maiti J Development of a hybrid methodology for dimensionality reduction in Mahalanobis-Taguchi system using Mahalanobis distance and binary particle swarm optimization - 27.
Foteinopoulos, Ch.Pasadakis N Sourligas S Prediction of the distillation profile and cold properties of diesel fuels using mid-IR spectroscopy and neural networks - 28.
Pasquini C Near Infrared Spectroscopy: Fundamentals, Practical Aspects and Analytical Applications Journal of Chemical Brazilian Society,2003 2003 14 198 219 - 29.
Pasquini C Bueno A. F Characterization of petroleum using near-infrared spectroscopy: Quantitative modeling for the true boiling point curve and specific gravity - 30.
Pasquini C Bueno A. F Characterization of petroleum using near-infrared spectroscopy: Quantitative modeling for the true boiling point curve and specific gravity - 31.
Pozo C Fermenia R. R Caballero J Gosa G. G Jiménez L On the use of Principal Component Analysis for reducing the number of environmental objectives in multi-objective optimization: Application to the design of chemical supply chains - 32.
S.Purcell D. E O Shea M. G Kokot Role of chemometrics for at-field application of NIR spectroscopy to predict sugarcane clonal performance - 33.
Modified tabu search approach for variable selection in quantitative structure-activity relationship studies of toxicity of aromatic compounds. Artificial Intelligence in Medicine,Qi S Shi W. M Kong W 2010 2010 49 61 66 - 34.
Rajalahti T Kvalheim O. M Multivariate data analysis in pharmaceutics: A tutorial review - 35.
Roy K Roy P. P Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques - 36.
Santos Jr V O.; Oliveira F. C. C.; Lima, D. G.; Petry,A. C.; Garcia, E.; Suarez, P. A. Z.; Rubim, J. C. A comparative study of diesel analysis by FTIR, FTNIR and FT-Raman spectroscop y using PLS and artificial neural network analysis. - 37.
Saunier O Bocquet M Mathieu A Isnard O Model reduction via principal component truncation for the optimal design of atmospheric monitoring networks Atmospheric Environment,2009 2009 43 4940 4950 - 38.
Skoog D. A Holler F. J Crouch S. R Principles of Instrumental Analysis ed. Porto Alegre: Bookman,2007 - 39.
Fourier Transforms- New Analytical Approaches and FTIR Strategies: Open-Path FTIR Detection of Explosives on Metallic Surfaces. Rijeka: InTech,Suarez J. R. C Londoño L. C. P Reyes M. V Diem M Tague T. J Rivera S. P. H 2011 p. - 40.
Tarumi T Small G. W Combs R. J Kroutil R. T Infinite impulse response filters for direct analysis of interferogram data from airborne passive Fourier transform infrared spectrometry - 41.
Teixeira L. S. G Oliveira F. S Santos H. C. S Cordeiro P. W. L Almeida S. Q Multivariate calibration in Fourier transform infrared spectrometry as a tool to detect adulterations in Brazilian gasoline - 42.
WAUKESHA Industrial Engine- Industrial Gas Engine: - 43.
<http://www dresserwaukesha.com/index.cfm/go/list-products/productline/CFR-F1 F2-octane-category/>. Accessed in 06 March2012 - 44.
Wu D. S Feng X Wen Q. Q The Research of Evaluation for Growth Suitability of Carya Cathayensis Sarg. Based on PCA and AHP - 45.
Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems,Xu Q. S Liang Y. Z 2001 2001 56 1 11 - 46.
Zhang G Ni Y Churchill J Kokot S Authentication of vegetable oils on the basis of their physico-chemical properties with the aid of chemometrics