Open access

Contributions of Multivariate Statistics in Oil and Gas Industry

Written By

Leandro Valim de Freitas, Ana Paula Barbosa Rodrigues de Freitas, Fernando Augusto Silva Marins, Estéfano Vizconde Veraszto, José Tarcísio Franco de Camargo, J. Paulo Davim and Messias Borges Silva

Submitted: 07 December 2011 Published: 09 January 2013

DOI: 10.5772/54090

From the Edited Volume

Multivariate Analysis in Management, Engineering and the Sciences

Edited by Leandro Valim de Freitas and Ana Paula Barbosa Rodrigues de Freitas

Chapter metrics overview

3,687 Chapter Downloads

View Full Metrics

1. Introduction

This study aims to develop and validate multivariate mathematical models in order to monitor in real time the quality processing of derivatives in an oil refinery.

Methods heavily based on statistical and artificial intelligence as multivariate or chemometric methods have been widely used in the oil industry (KIM; LEE, KIM, 2009). Several articles have been written about applications of multivariate analysis to predict properties of oil derivatives (Santos Junior et al., 2005; Chung, 2007).

Pasadakis, Sourligas and Foteinopoulos (2006) have used the first six principal components of Principal Component Analysis (PCA) as input variables in nonlinear modeling of oil properties.

Pasquini and Bueno (2007) have proposed a new approach to predict the true boiling point of oil and its degree API (American Petroleum Institute) - a measure of the relative density of liquids by Partial Least Squares (PLS) and Artificial Neural Networks (ANN). Samples of mixtures oil were obtained from various producing regions of Brazil and abroad. In this application, the models obtained by the PLS method were superior to neural networks. The short time required for prediction the properties justifies the proposed of characterization the oil quicker to monitor refining processes.

Teixeira et al. (2008) in work with Brazilian gasoline used the multivariate algorithm Soft Independent Modeling of Class Analogy (SIMCA) for clusters analysis. Aiming to quantify the amount of adulteration of gasoline by other hydrocarbons, the PLS method was applied. Finally, the models were validated internally by cross-validation algorithm and externally with an independent set of samples.

Bao and Dai (2009) studied different multivariate methods, including linear and nonlinear techniques in order to minimize the error of prediction by models developed for quality control of gasoline. Lira et al. (2010) applied the PLS method for inference of the quality parameters: density, sulfur concentration and distillation temperatures of the mixture diesel / bio-diesel, providing great savings in time compared with the traditional methods by laboratory equipment.

Aleme, Corgozinho and Barbeira (2010) have conducted a study of classification of samples using the PCA method for discrimination of diesel oil type and the prediction of their origin.

Paiva Ferreira and Balestrassi (2007) have combined the Response Surface Method (RSM) of Design of Experiments (DOE) with Principal Component Analysis in optimizing multiple correlated responses in a manufacturing process.

Huang, Hsu and Liu (2009) have used Mahalanobis-Taguchi integrated with Artificial Neural Networks in data mining to look for patterns and modeling in manufacturing. Pal and Maiti (2010) have adopted the Mahalanobis-Taguchi algorithm to reduce the dimensionality of multivariate data and for optimization with Metaheuristics in the sequence.

Liu et al. (2007) have made inferences about quality parameters of jet fuel using Multiple Linear Regression (MLR) and ANN. The work showed that the performance of modeling by ANN was superior.

In optimization of multivariate models, there are applications combined with Multivariate Analysis of Metaheuristics, such as simulated annealing (SAUNIER, et al., 2009), genetic algorithm (GA) (Roy, Roy, 2009) tabu search (QI; SHI; KONG, 2010), particle swarm (Pal; Mait, 2010), and ant colony (Goodarzi; Freitas; Jensen, 2009; Allegrini; Oliveri, 2011).

With the objective of optimizing the dimensionality of multivariate models and avoid the overfitting phenomenon in determining principal components, Xu and Liang (2001) have used the Monte Carlo Simulation on simulated data sets and two real cases. Gourvénec et al. (2003) compared Monte Carlo cross-validation with the traditional method of cross validation to determine the appropriate number of latent variables.

Adler e Yazhemsky (2010) have combined the Monte Carlo Simulation, PCA and Data Envelopment Analysis (DEA) in a context where there is a relatively large number of variables related to the number of observations for decision making. Llobet et al. (2005), by means a Multiple Criteria Decision-Making (MCDM) model, have used Fuzzy classification of samples of chips. For prediction oxidative and hydrolytic properties, was used an electronic nose based on PLS models, with prior selection of input variables by a GA Metaheuristic.

Wu, Feng and Wen (2011), in studies related to Botany, compared the performance of the growth of a tree species - Carya Cathayensis Sarg by PCA methods and Analytic Hierarchy Process (AHP), identifying the advantages and the disadvantages of each method, although the results obtained by both have been essentially identical.

Zhang et al. (2006) have combined the method Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE), from the Elimination et la Choix Traduisant Réalité (ELECTRE) and Geometrical Analysis for Interactive Assistance (GAIA) with PCA and PLS methods to classify 67 oils and determine an indicator of product quality. Purcell, O'Shea and Kokot (2007) also combined PROMETHEE and GAIA with PCA and PLS in studies related to cloning of sugarcane.

Regarding to the control charts designed to monitor the mean vector, Machado and Costa (2008) have studied the performance of T2 charts based on principal components for monitoring multivariate processes. Lourenço et al. (2011) have used the principles of Process Analytical Technology (PAT) in the construction of control charts based on the scores of the first principal component versus time for the on-line monitoring of pharmaceutical processes.

Moreover, Multivariate Analysis is an important technique in various areas of knowledge such as Data Mining (Kettaneh; Berglund; Wold, 2005); Econometrics (Mackay, 2006); Marketing (Ahn; Choi; Han, 2007) and Supply Chain Management (Pozo et al., 2012).


2. Application: Oil refining

The first process in a refinery is atmospheric distillation or direct distillation, where components of crude oil are separated into different sections using different boiling points. The main products obtained in this process are: liquefied petroleum gas (LPG), naphtha - precursor of gasoline, jet fuel, diesel and fuel oil.

Additionally, refineries usually have a second tower, vacuum distillation, to produce diesel cuts. These intermediate streams feeding a chemical process called Fluid Catalytic Cracking (FCC). In this, two noble streams are generated: LPG, and gasoline. It is a refining scheme much more flexible, but though modern, may also present difficulties for framing products stricter specifications.

The production scheme level 3 is more flexible and cost effective than the previous one, because it uses the chemical process of Coking, which transforms a fraction of lower value - vacuum residue of distillation towers, in the noblest products like LPG, gasoline, naphtha and diesel oil.

This final refining scheme incorporates the process Hydrotreating of middle fractions generated in the Coker Unit, enabling increased supply of diesel with good quality. This scheme allows a more balanced supply of gasoline and diesel oil, producing more diesel and less gasoline than the previous settings.

Of course, there are other macro-processes and auxiliary processes such as water treatment plant, effluent disposal, sulfur recovery units, units of hydrogen generation and consequently other interconnections, details of which are not subject of this work (ANP, 2012).


3. Methods

3.1. Acquisition database: Infrared radiation

In the oil industry, signs of infrared radiation generated by sensors are associated with the prediction of the quality of distillates such as naphtha, gasoline, diesel and jet fuel (Kim, Cho; Park, 2000).

Freitas et al. (2012) and Pasquini (2003) explain this instrumentation (Figure 1): the polychromatic radiation emitted by the source has a wavelength selected by a Michelson interferometer. The beam splitter has a refractive index such that approximately half of the radiation is directed to the fixed mirror and the other half is reflected, reaching the movable mirror and is therefore reflected by them. The optical path differences occur due the movement of the movable mirror that promotes wave interference.

An interferogram is obtained as a result of a graph of the signal intensity received by the detector versus the difference in optical path traveled by the beams. By calculating the Fourier Transform (FT) the interferogram can be written as a sum of sines and cosines (Tarumi et al, 2005) and in this case, happens to be called transmittance spectra, T (Forato; Filho; Colnago, 1997). Finally, the spectrum of transmittance, T, is converted to absorbance spectra, A, by co-logarithm of T (Suarez et al. May 2011). The absorbance can be interpreted as the amount of radiation that the sample absorbs and the transmittance, the fraction of radiation that the sample does not absorb. These phenomena occur depending on their chemical composition (Kramer; Small, 2007).

Figure 1.

Scheme for technology acquisition database (Adapted from Pasquini, 2003)

The chemical bonds of the type carbon-hydrogen (CH), oxygen-hydrogen (OH) and nitrogen-hydrogen (NH), present in petroleum products (Pasquini; Bueno, 2007), are responsible for the absorption of infrared radiation, however, are not very intense and overlap. The broad spectral bands formed are difficult to interpret (Skoog; Holler; Crouch, 2007) due to the phenomenon of collinearity (Naes; Martens, 1984). The origin of this phenomenon is associated with the manner in which the infrared radiation interacts with matter and can be demonstrated by Quantum Mechanics at work Pasquini (2003).

These input variables (radiation absorbed), called Xi are correlated, so are said collinear or multicollinear (NAES et al., 2002). To illustrate the collinearity, X is a dummy matrix aij with i rows and j in terms columns, where aij is the radiation absorption of three samples i (i = 1, 2, 3) at two wavelengths j (j = 1, 2).


The columns of X are linearly dependent, so the variables column j1 and j2 are colinear, that is, when increases j1, j2 increases proportionally. This causes the determinant of X'X to be zero, where X' is the transpose of matrix X.


Then, the det (X'X) = (14.56) - (28.28) = 0 and this according to Naes et. al (2002) means that there is a singular error matrix and that those erros are propagated when the dependent properties, Y, are determined by regression methods which are not based on the principal components, such as the MLR.

However, the multivariate approaches such as Principal Component Regression (PCR) and PLS have been quite appropriate due to dimensionality reduction, which creates a new set of variables called principal components (Rajalahti; Kvalheim, 2011). So with data mining for Multivariate Analysis, it is possible to relate the physicochemical properties (quality characteristics) of products with the chemical composition of the sample reflected by the absorption spectra. So once modeled a property, just a sample is subjected to infrared radiation to predict their properties.

3.1. Acquisition data base: Reference properties

In this work were modeled properties of gasoline, diesel and jet fuel. For gasoline, the octane number and for diesel oil and jet fuel, the kinematic viscosity property.

According to Freitas (2012), kinematic viscosity of the diesel oil and jet fuel products is an important property in terms of its effect on power system and in fuel injection. Both high and low viscosities are undesirable since they can cause, among others, problems in fuel atomization. The formation of large and small droplets (low viscosity), can lead to a poor distribution of fuel and compromise the mixture air – fuel resulting in an incomplete combustion followed by loss power and greater fuel consumption.

The octane number of a gasoline is an important characteristic which is related to their ability to burn in spark-ignition engines. It is determined by comparing its tendency to detonate with the reference fuel with octane known under standard operating conditions.

When it comes to defining the octane required by engines, many countries use anti-knock index (I), defined by Equation 1:

I = MON + RON2E3

where MON is the Motor Octane Number and RON is the Research Octane Number. The method MON measures the resistance to detonation when gasoline is being burned in the most demanding operating conditions and at higher rotations. The test is done in motors CFR (Cooperative Fuel Research), single-cylinder with variable compression ratio equipped with the necessary instrumentation in a stationary base, as shown in Figure 2.

Figure 2.

CFR engine for MON octane (WAUKESHA, 2012)

The RON method evaluates the resistance of the gasoline to detonation under milder conditions and work in less rotation than that measured by octane number MON. The test is done in similar engines to those used for testing in MON octane.

It takes two hours and half to run the test MON and it is spent the same time for the test RON.


4. Results and discussion

Samples of gasoline, diesel and jet fuel, collected during 1 year, were subjected to laboratory tests, to determine the input variables, Xi, which are the infrared radiation absorbed, and the response variables, Yi, that are physicochemical properties. The physicochemical properties will be predicted by PLS models.

The Table 1 summarizes the validation results of each model for products gasoline, diesel and jet fuel, where RMSEP (Root Mean Square Error of Prediction) corresponds to the standard deviation of the residuals (differences between measured and predicted values by the model).

The Figures 3-6 illustrate that the residues of models follow normal distribution, since in all cases the p-value was greater than 0.05.

ProductPropertyNumber of samplesLatent VariablesRMSEPCorrelation
Diesel OilViscosity (40ºC)18080.116 cSt0.9368
Jet FuelViscosity (-20ºC)27970.01 cSt0.8836

Table 1.

Summary of results of modeling and validation.

Figure 3.

Normality Test for the property MON

Figure 4.

Normality Test for the property RON

Figure 5.

Normality Test for the property viscosity (diesel)

Figure 6.

Normality Test for the property viscosity (jet fuel)


5. Conclusions

The following conclusions can be drawn from the results of this study:

It was possible to model mathematically the properties octane number and viscosity of the products gasoline, diesel and jet fuel.

The developed models were externally validated according to ASTM D-6122 and their predictions have precision equivalent to the reference methods.

The results were used in an oil refinery and contributed immensely to speed up the decision-making in blendings systems. Unlike the laboratory trials, the response time of a property along with the computational time does not exceed three minutes.


  1. 1. AdlerNYazhemskyEImproving discrimination in data envelopment analysis: PCA-DEA or variable reductionEuropean Journal of Operational Research20102010202273284
  2. 2. AGENCY OF OILNATURAL GAS AND BIOFUELS. Schemes Production in Oil Refining:
  3. 3. <>. Accessed in 06 March 2012
  4. 4. AhnHChoiEHanIExtracting underlying meaningful features and canceling noise using independent component analysis for direct marketingExpert Systems with Applications, 2007200733181191
  5. 5. AlemeH. GCorgozinhoC. N. CBarbeiraP. J. SDiesel oil discrimination by origin and type using physicochemical properties and multivariate analysisFuel201020108931513156
  6. 6. BaoXDaiLPartial least squares with outlier detection in spectral analysis: A tool to predict gasoline propertiesFuel20092009881216122
  7. 7. ForatoL. AFilhoR. BColnagoL. AEstudos de métodos de aumento de resolução de espectros de FTIR para análises de estruturas secundárias de proteínas. Química Nova, 1997199720146150
  8. 8. FreitasL. VFreitasA. P. B. RMarinsF. A. SLouresC. C. ASilvaM. BMultivariate modeling in quality control of viscosity in fuel: an application in oil industry. Fuel Injection in Automotive Engineering. Rijeka: InTech, 2012
  9. 9. GoodarziMFreitasMJensenRAnt colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl) uracil derivatives using MLR, PLS and SVM regressions. Chemometrics and Intelligent Laboratory Systems, 2009200998123129
  10. 10. GourvénecSPiernaJ. A. FMassartD. LRutledge. An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS modelChemometrics and Intelligent Laboratory Systems200320036848
  11. 11. HuangC. LHsuT. SLiuC. MThe Mahalanobis-Taguchi system- Neural network algorithm for data-mining. Expert Systems with Applications, 200920093654755480
  12. 12. KettanehNBerglundAWoldSPCA and PLS with very large data setsComputational Statistics & Data Analysis20052005486985
  13. 13. KimDLeeJKimJApplication of near infrared diffuse reflectance spectroscopy for on-line measurement of coal propertiesKorean Journal of Chemical Engineering2009489495
  14. 14. KimKChoIParkJUse of real-time NIR (near infrared) spectroscopy for the on-line optimization of a crude distillation unitIn: NPRA, Computer Conference. Chicago, 2000
  15. 15. KramerK. ESmallG. WRobust absorbance computations in the analysis of glucose by near-infrared spectroscopyVibrational Spectroscopy2007200743440446
  16. 16. LiraL. F. BVasconcelosF. V. CPereiraC. FPaimA. P. SStragevitchLPimentelM. FPrediction of properties of diesel/biodiesel blends by infrared spectroscopy and multivariate calibrationFuel2010201089405409
  17. 17. LiuHYuJXuJFanYBaoXIdentification of key oil refining technologies for China National Petroleum Co. (CNPC)Energy Police, 200720073526352647
  18. 18. LlobetM. V. EBrezmesJVilanovaXCorreigXA fuzzy ARTMAP- and PLS-based MS e-nose for the qualitative and quantitative assessment of rancidity in crispsSensors and Actuators20052005106677686
  19. 19. LourençoVHerdlingTReichGMenezesJ. CLochmannDCombining microwave resonance technology to multivariate data analysis as a novel PAT tool to improve process understanding in fluid bed granulationEuropean Journal of Pharmaceutics and Biopharmaceutics2011201178513521
  20. 20. MachadoM. A. GCostaA. F. BThe use of principal components and univariate charts to control multivariate processesOperational Resarch, 2008200828173196
  21. 21. MackayDChemometrics, econometrics, psychometrics- How best to handle hedonics? Expert Systems with Applications, 2006200617529535
  22. 22. NaesTIsakssonTFearnTDaviesTPartial Least Squares. In: Multivariate calibration and classification. Chichester. NIR Publications, 2002
  23. 23. NaesTMartensHMultivariate calibration II. Chemometric methods. Trends in analytical chemistry, 198419843266271
  24. 24. PaivaA. PMetodologia de superfície de resposta e análise de componentes principais em otimização de processos de manufatura com múltiplas respostas correlacionadas. PhD Thesis; Itajubá Federal University, 2006
  25. 25. PalAMaitiJDevelopment of a hybrid methodology for dimensionality reduction in Mahalanobis-Taguchi system using Mahalanobis distance and binary particle swarm optimizationExpert Systems with Applications201020103712861293
  26. 26. PalAMaitiJDevelopment of a hybrid methodology for dimensionality reduction in Mahalanobis-Taguchi system using Mahalanobis distance and binary particle swarm optimizationExpert Systems with Applications201020103712861293
  27. 27. PasadakisNSourligasSFoteinopoulos, Ch. Prediction of the distillation profile and cold properties of diesel fuels using mid-IR spectroscopy and neural networksFuel200620068511311137
  28. 28. PasquiniCNear Infrared Spectroscopy: Fundamentals, Practical Aspects and Analytical ApplicationsJournal of Chemical Brazilian Society, 2003200314198219
  29. 29. PasquiniCBuenoA. FCharacterization of petroleum using near-infrared spectroscopy: Quantitative modeling for the true boiling point curve and specific gravityFuel200720078619271934
  30. 30. PasquiniCBuenoA. FCharacterization of petroleum using near-infrared spectroscopy: Quantitative modeling for the true boiling point curve and specific gravityFuel200720078619271934
  31. 31. PozoCFermeniaR. RCaballeroJGosaG. GJiménezLOn the use of Principal Component Analysis for reducing the number of environmental objectives in multi-objective optimization: Application to the design of chemical supply chainsChemical Engineering Science2012201269146158
  32. 32. PurcellD. EOSheaM. GKokotS. Role of chemometrics for at-field application of NIR spectroscopy to predict sugarcane clonal performanceChemometrics and Intelligent Laboratory Systems2007200787113124
  33. 33. QiSShiW. MKongWModified tabu search approach for variable selection in quantitative structure-activity relationship studies of toxicity of aromatic compounds. Artificial Intelligence in Medicine, 20102010496166
  34. 34. RajalahtiTKvalheimO. MMultivariate data analysis in pharmaceutics: A tutorial reviewInternational Journal of Pharmaceutics20112011417280290
  35. 35. RoyKRoyP. PComparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniquesEuropean Journal of Medicinal Chemistry200920094429132922
  36. 36. Santos Jr VO.; Oliveira F. C. C.; Lima, D. G.; Petry, A. C.; Garcia, E.; Suarez, P. A. Z.; Rubim, J. C. A comparative study of diesel analysis by FTIR, FTNIR and FT-Raman spectroscopy using PLS and artificial neural network analysis. Analytica Chimica ACTA20052005547188196
  37. 37. SaunierOBocquetMMathieuAIsnardOModel reduction via principal component truncation for the optimal design of atmospheric monitoring networksAtmospheric Environment, 200920094349404950
  38. 38. SkoogD. AHollerF. JCrouchS. RPrinciples of Instrumental Analysised. Porto Alegre: Bookman, 2007
  39. 39. SuarezJ. R. CLondoñoL. C. PReyesM. VDiemMTagueT. JRiveraS. P. HFourier Transforms- New Analytical Approaches and FTIR Strategies: Open-Path FTIR Detection of Explosives on Metallic Surfaces. Rijeka: InTech, 2011p.
  40. 40. TarumiTSmallG. WCombsR. JKroutilR. TInfinite impulse response filters for direct analysis of interferogram data from airborne passive Fourier transform infrared spectrometryVibrational Spectroscopy20052005373952
  41. 41. TeixeiraL. S. GOliveiraF. SSantosH. C. SCordeiroP. W. LAlmeidaS. QMultivariate calibration in Fourier transform infrared spectrometry as a tool to detect adulterations in Brazilian gasolineFuel2008200887346352
  42. 42. WAUKESHAIndustrial Engine- Industrial Gas Engine:
  43. 43. <>. Accessed in 06 March 2012
  44. 44. WuD. SFengXWenQ. QThe Research of Evaluation for Growth Suitability of Carya Cathayensis Sarg. Based on PCA and AHPProcedia Engineering201120111518791883
  45. 45. XuQ. SLiangY. ZMonte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems, 2001200156111
  46. 46. ZhangGNiYChurchillJKokotSAuthentication of vegetable oils on the basis of their physico-chemical properties with the aid of chemometricsTalanta2006200670293300

Written By

Leandro Valim de Freitas, Ana Paula Barbosa Rodrigues de Freitas, Fernando Augusto Silva Marins, Estéfano Vizconde Veraszto, José Tarcísio Franco de Camargo, J. Paulo Davim and Messias Borges Silva

Submitted: 07 December 2011 Published: 09 January 2013