Process Monitoring Using Data-Based Fault Detection Techniques: Comparative Studies

Data based monitoring methods are often utilized to carry out fault detection (FD) when process models may not necessarily be available. The partial least square (PLS) and principle component analysis (PCA) are two basic types of multivariate FD methods, however, both of them can only be used to monitor linear processes. Among these extended data based methods, the kernel PCA (KPCA) and kernel PLS (KPLS) are the most well-known and widely adopted. KPCA and KPLS models have several advan-tages, since, they do not require nonlinear optimization, and only the solution of an eigenvalue problem is required. Also, they provide a better understanding of what kind of nonlinear features are extracted: the number of the principal components (PCs) in a feature space is fixed a priori by selecting the appropriate kernel function. Therefore, the objective of this work is to use KPCA and KPLS techniques to monitor nonlinear data. The improved FD performance of KPCA and KPLS is illustrated through two simulated examples, one using synthetic data and the other using simulated continuously stirred tank reactor (CSTR) data. The results demonstrate that both KPCA and KPLS methods are able to provide better detection compared to the linear versions.


Introduction
Process monitoring is an essential aspect of nearly all industrial processes, often required both to ensure safe operation and to maintain product quality. Process monitoring is generally carried out in two phases: detection and diagnosis. This chapter focuses only on the fault detection aspect. Fault detection methods can be categorized using a number of different methodologies. One popular method of categorization is into quantitative model-based methods, qualitative model-based methods, and data (process history)-based methods [1][2][3]. Figure 1 illustrates a general schematic of fault detection phase.
Quantitative model-based methods require knowledge of the process model, while qualitative model-based methods require expert knowledge of the given process. Hence, data-based methods are often used as they require neither prior knowledge of the process model nor expert knowledge of the process [4].
Data-based monitoring methods can be further classified into input model-based methods and input-output model-based methods. Input model-based methods only require the data matrix of the input process variables, while input-output model-based methods require both the input and output data matrices in order to formulate a model and carry out fault detection [5]. Input model-based methods are sometimes utilized when the input-output models cannot be formed due to the high dimensionality and complexity of a system being monitored [6]. However, input-output model-based methods do have the added advantage of being able to detect faults in both the process and the variables [5].
Principal component analysis (PCA) is a widely used input model-based method that has been used for monitoring a number of processes including air quality [7], water treatment [8], and semiconductor manufacturing [9]. On the other hand, partial least squares (PLS) are an inputoutput model-based method that has been applied in chemical processes to monitor online measurement variables and also to monitor and predict the output quality variable [10]. PLS has been applied for the monitoring of distillation columns, batch reactors [11], continuous polymerization processes [12], and other similar industrial processes, which are described by input-output models. However, both PCA and PLS are fault detection techniques that only work reasonably well with linear data. PCA and PLS have been extended to handle nonlinear data by utilizing kernels to transform the data to a higher dimensional space, where linear relationships between variables can be drawn. The extensions kernel principal component analysis (KPCA) and kernel partial least squares (KPLS) have both shown improved performance over the conventional PCA and PLS techniques when handling nonlinear data [5,13]. T 2 and Q charts are commonly used as fault detection statistics. In the literature, it has been seen that T 2 test is less effective fault detection technique compared to Q statistic; this is because T 2 test can only represent variation of the data in the principle component and not in residue of the model [14].
In our previous works [5,13,15], we addressed the problem of fault detection using linear and nonlinear input models (PCA and kernel PCA) and input-output model (PLS and kernel PLS)based generalized likelihood ratio test (GLRT), in which PCA, kernel PCA, PLS, and kernel PLS methods are used for modeling and the univariate GLRT chart is used for fault detection. In the current work, we propose to use the PCA, kernel PCA, PLS, and kernel PLS methods for multivariate fault detection through their multivariate charts Q and T 2 . The fault detection performance is evaluated using two examples, one using simulated synthetic data and the other utilizing a simulated continuous stirred tank reactor (CSTR) model.
The remainder of this chapter is organized as follows. Section 1 introduces linear PCA and PLS, along with the fault detection indices used for these methods. Section 2 then describes the idea of using kernels for nonlinear transformation of data, along with the kernel fault detection extensions: KPCA and KPLS. In Section 3, two illustrative examples are presented, one using simulated synthetic data and the other utilizing a simulated continuous stirred tank reactor. At the end, the conclusions are presented in Section 4.

Conventional linear fault detection methods
Before constructing either the PCA or PLS models, data are generally preprocessed to ensure that all process variables in the data matrix are scaled to zero mean and unit variance. This step is essential as different process variables are usually measured with varying standard deviations and means and often using different units.

Principal component analysis (PCA)
Consider the following input data matrix, X ∈ R n · m , where m and n represent the number of process variables and the number of observations, respectively. After preprocessing the data, single value decomposition (SVD) can be utilized to express the input data matrix as follows: m is a matrix of the transformed variables, where each column represents the score vectors or the transformed variables, and P ¼ p 1 , p 2 , p 3 …p m Â Ã ∈ R m · m is a matrix of the orthogonal vectors, where each column is also known as loading vectors, and these are eigenvectors that are associated with the covariance matrix of the input data matrix X. The covariance matrix can be computed as follows [13]: Þis a diagonal matrix that contains the eigenvalues that are related to the m principal components λ 1 > λ 2 > … > λ m ð Þ , and I m is the identity matrix [16]. It should be noted that the model built by PCA uses the same number of principal components as the original number of process variables in the input data matrix (m). However, since many industrial processes may contain process variables that are highly correlated, a smaller number of principal components can be utilized to capture the variation in the process data [6]. The quality of the model built by PCA is dictated by the number of principal components obtained. Overestimating the number could introduce noise that may mask important features in the data, while underestimating the number could decrease the prediction ability of the model [17].
Therefore, selection of the number of principal components is vital, and several methods have been developed for this purpose. A few popular techniques are cumulative percent variance (CPV) [13], scree plot and profile likelihood [18], and cross validation [19]. CPV is commonly utilized due to its computational simplicity and because it provides a good estimate of the number of principal components that need to be retained for most practical applications. CPV can be computed as follows [13]: CPV is used to select the smallest number of principal components that represents a certain percentage of the total variance (e.g., 99%). Once the number of principal components to retain is determined, the input data matrix can then be expressed as [13]: where b T ∈ R n · l andT ¼ ∈ R n · mÀl represent the matrices containing the l retained principal components and the ignored (m À l) principal components, respectively. Likewise, the matrices that contain the l retained eigenvectors and the ignored (m À l) eigenvectors are represented by b P ∈ R m · l andP ∈ R m · mÀl , respectively.
After expansion X can be expressed as [13] where matrix b X is the modeled variation of X computed utilizing only the l retained principal components, while matrix E represents the residual space formed by variations that correspond to process noise.
The PCA model can be illustrated as shown in Figure 2.

Partial least squares (PLS)
PLS is a popular input-output technique used for modeling, regression and as a classification tool, which has been extended to fault detection purpose [20]. PLS includes process variables (X ∈ R n · m ) and the quality variables (Y ∈ R n · p ) with a linear relationship between input and output score vectors. Nonlinear iterative partial least square (NIPALS) algorithm developed by Word et al. is used to compute score matrices and loading vectors [21]: where E ∈ R n · m and F ∈ R n · p are the PLS model residues; T ∈ R n · M and U ∈ R n · M are the orthonormal input and output score matrix, respectively; P and Q are the loading vectors of the input (X) and output (Y) matrices, respectively; m and n are the number of process variables and observations in input (X) matrix; p is the number of quality variables in output (Y) matrix; and M is the total number of latent variables extracted. NIPALS method is shown in Algorithm 1; X and Y matrixes are first standardized by mean centering and unit variance. NIPALS algorithm is initialized by assigning one of the columns of output matrix (Y) as output score vector (u); at each iteration t, u, p, and q are computed and stored; M latent variables are extracted.
Another modification to NIPALS algorithm has been published in the literature [22]. In other work, different variations of PLS technique have been stated. Qin et al. have presented recursive PLS model [23], where PLS model is updated with new training data set; MacGregor et al. [24] have developed multiblock PLS model to monitor subsection of process variables. While for process monitoring of batch processes, PLS has been extended to multiway PLS technique [25] to incorporate past batches in training data set.
PLS being an input-output type model can also be used as a regression tool, to predict quality variable (Y) from online measurement variable (X). From PLS, model input and output matrices are related by The regression coefficient B is computed as shown in Eq. (8): Substituting weights, loading vector P and constant C from Algorithm 1: From Eqs. (7) and (9), the output matrix (Y) is predicted as Algorithm 1: Modified NIPALS algorithm 1. Initialized output score: u = y i .
2. Weights regressed on X: w = u T X/u T u.
6. Input loading vector: p = Xt T /t T t.
7. Output loading vector: q = Yt T /t T t.
9. Normalizing weights, loading vectors and scores: 10. Input and output matrices are deflated: 11. Store latent score vectors in T and U, loading vectors in P and Q

12.
Repeat steps 2 to 11 until M latent variables are computed

Fault detection indices
Different fault detection indices can be used for the linear PCA and PLS techniques. The two most popular indices are the T 2 and Q statistics. T 2 measures the variation of the model, while the Q statistic measures the variation in the residual space, and these statistics will be described next.

T 2 statistic
The T 2 statistic measures the variation in the principal components at different time samples and is defined as follows [26,27]: Þis the diagonal matrix that contains the eigenvalues that are associated with the retained principal components. For testing data, a fault is declared when the T 2 value exceeds the value of the threshold as follows: where α is the level of significance, generally assigned a value between 90 and 99%, and F (a, n À a) is the critical value of the Fisher-Snedecor distribution with n and nÀa degrees of freedom.

Q statistic
The Q statistic measures the projection of the data on to the residual subspace and allows the user to measure how well the data fit the PCA model. The Q statistic is defined as follows [16]: For testing data, a fault is declared when the threshold value is violated as follows [16]: , where c α is the value obtained from the normal distribution of significance α.

Nonlinear fault detection methods using kernel transformations
A popular nonlinear version of PCA and PLS is the projection of nonlinear data to a highdimensional feature space, where the linear fault detection method is applied in the features space, F. The authors in Ref. [28] used projection of X for PLS response surface modeling using the quadratic function as the mapping function: However, it is difficult to know the accurate nonlinear transformation function for nonlinear data matrix to be linear in the feature space. According to Mercer's theorem, orthogonal semipositive definite function can be used to map the data into the feature space instead of knowing the explicit nonlinear function. This nonlinear function is called the kernel function and is defined as the dot product of the mapped data in the feature space: Thus, kernel-based multivariate methods can be defined as nonlinear fault detection methods in which the input data matrix is mapped into high-dimensional feature space and developed linear models can be applied in the feature space for fault detection purposes.
Commonly used kernel functions are given below [29]: Radial basis function (RBF): Polynomial function: Sigmoid function: The next section describes the methodology of utilizing kernel transformations to extend linear PCA and PLS to the hyperdimensional space in order to carry out fault detection of nonlinear data.

Kernel principal component analysis (KPCA)
While PCA seeks to find the principal components by minimizing the data information loss in the input space, KPCA does this in the feature space (F). For KPCA learning using training data, X 1 , X 2 , …, X n ∈ R m , nonlinear mapping gives Φ: X ∈ ℜ m ! Z ∈ ℜ h , where input data are extended into the hyperdimensional feature space, where the dimension can be very large and possibly infinite [30].
The covariance in the feature space can be computed as follows [31]: Similar to PCA, the principal components in the feature space can be found by diagonalizing the covariance matrix. In order to diagonalize the covariance matrix, it would be necessary to solve the following eigenvalue problem in the feature space [31]: where λ ≥ 0 and represents the eigenvalues.
In order to solve the eigenvalue problem, the following equation is derived [32]: where K and α are the n · n kernel matrix and eigenvectors, respectively.
For test vector X, the principal components (t) are extracted projecting Φ(X) onto the eigenvectors v k in the feature space where k = 1,…,l: It is important to note that before carrying out KPCA, it is necessary to mean center the data in the high-dimensional space. This can be accomplished by replacing the kernel matrix K with the following [32]: where 1 n ¼ 1

T 2 statistic for KPCA
Variation in the KPCA model can be found using T 2 statistic, which is the sum of normalized squared scores, computed as follows [31]: where t k is obtained from Eq. (33).
The confidence limit is computed as follows [31]:

Q statistic for KPCA
In order to compute the Q statistic, the feature vector Φ(X) needs to be reconstructed. This is done by projecting t k into the feature space using v k as follows [31]: The Q statistic in the feature space can now be computed as [31] Q The confidence limit of the Q statistic can then be computed using the following equation [31]: This limit is based on Box's equation, obtained by fitting the reference distribution obtained using training data, to a weighted distribution. Parameter g is the weight assigned to account for the magnitude of the Q statistic, and h represents the degree of freedom. Considering a and b the estimated mean and variance of the Q statistic, g and h are approximated using g = b/2a and h = 2a 2 /b.

Kernel partial least square (KPLS)
The KPLS methodology works by mapping the data matrices into the feature space and then applying the nonlinear partial least square algorithm and computing the loading and score vectors.
The mapped data points are given as The kernel gram function can be used to map the data into the feature space instead explicitly using the nonlinear mapping function; this is called the kernel trick. Kernel gram function is defined as the dot product of the mapping function: As with the KPCA algorithm, the kernel matrix has to be mean centered before applying the NIPALS algorithm using Eq. (34).
The input score matrix and weights are computed as Thus, the score matrix is given as [33] t Now, the relationship between the input and output score matrices can be derived by combing Eqs. (15), (17), and (18): In the feature space, replace X by its image Φ: Substituting the kernel gram function, K = ΦΦ T , input and output scores are given by After every iteration, input kernel (K) and output matrix (Y) are deflated as ΦΦ T dot product is replaced by kernel gram function K: Let X i f g n i¼1 be the training data and X j È É n j¼1 be the testing data, Φ(X i ) is the mapped training data, and Φ(X j ) is the mapped testing data. Kernel functions for the testing data are given as KPLS algorithm can also be used to predict output matrix Y from input matrix X as Φ t is mapped testing data in feature space from X j È É n j¼1 , and B is the regression coefficient which is given as [34] Thus combining Eqs. (50) and (51), we get predicted output quality matrix: Algorithm 2: Kernel partial least square (KPLS) algorithm 1. Compute Kernel matrix: K.

Kernel matrix is mean centered using Eq. (34).
3. For first iteration, initialized score matrix: u = y i .

5.
Deflate K and Y, using Eq. (48) 6. Score vectors t and u are stored in cumulative matrix T and U

7.
Repeat steps 1 to 6 to extract M latent variables.

T 2 statistic for KPLS
The T 2 statistic for KPLS can be computed as where Λ ¼ n À 1 ð Þ À1 T T T and the score matrix being orthonormal matrix T T T = I, leading to Λ ¼ n À 1 ð Þ À1 I. The score matrix is t ¼ K tt Z; hence the T 2 statistic is given by [33] The threshold value for T 2 statistic is computed using the f-inverse distribution and is given by [35] where n and m are the total number of observations and variables in the input data matrix X, respectively, and

Q statistic for KPLS
As with the other data-based models, the Q statistic computes the mean square error of the residue from the KPLS model: Substituting the kernel gram functions as the dot product of mapped points: The threshold value for the Q statistic under the significance level of α [36] is given by where g and h are given by A fault is declared in the system if the Q statistic value is higher than threshold value (Q α ) for new data set.
The following section demonstrates the implementation of the fault detection methods described above and analyzes the effectiveness of all techniques.

Illustrative examples
The effectiveness of the kernel extensions of PCA and PLS for fault detection purposes will be demonstrated through two illustrative nonlinear examples, using a simulated synthetic data set and a simulated continuous stirred tank reactor (CSTR).

Simulated synthetic data
Synthetic nonlinear data can be simulated through the following model [37]: where u is a variable that is defined between À1 and 1 and ε i is a variable of independent white noise distributed uniformly between À0.1 and 0.1. Training and testing data sets of 401 observations each are generated using the model above. The performance of KPCA and KPLS techniques is illustrated and compared to the conventional PCA and PLS methods for two different cases. In the first case, the sensor measuring the first variable x 1 is assumed to be faulty with a single fault. In the second case, multiple faults are assumed to occur simultaneously in x 1 , x 2 , and x 3 . Figure 3 shows the generated data.

Case 1
In this case, a single fault of magnitude unity is introduced between observations 200 and 250 in x 1 in the testing data set. The Gaussian kernel was chosen to model the nonlinearity in the process data. The most common fault detection metrics used are the missed detection rate, the false alarm rate, and the out-of-control average run length (ARL 1 ). The missed detection rate is when a fault goes undetected in the faulty region, while the false alarm rate is when an observation is flagged as a fault in the non-faulty region. The false alarm and missed detection rates are also commonly referred to as Type I and Type II errors, respectively. ARL 1 is the number of observations, and it takes for a particular technique to flag a fault in faulty region and is used to assess the speed of a detection. The fault-free and faulty data are shown in Figures 4 and 5, respectively.
The fault detection (FD) performance of PCA-, KPCA-, PLS-, and KPLS-based Q methods is shown in Figures 6 and 7 as well as Table 1. The results show that both KPCA and KPLSbased Q provide a better FD performance than the linear PCA-and PLS-based Q methods and are able to detect the faults with lower missed detection rates, false alarm rates, and ARL 1 values (see Table 1).

Case 2
In this case, a multiple faults of magnitude unity are introduced between observations 200 and 250 in x 1 , 100 and 150 in x 2 , and 385 and 401 in x 3 in the testing data set (as shown in Figure 8).
The FD performance of the kernel PCA and kernel PLS methods is illustrated and compared to that of the conventional PCA and PLS methods using the Q statistic. The Q statistic was chosen for analysis, since it is often better able to detect smaller faults using the residual space and for simplicity of analysis as well. The fault detection performance of a particular process monitoring technique can be monitored using multiple fault detection metrics. As can be seen through Figures 9(a) and 10(a) and Table 2, the conventional linear PCA and PLS techniques are unable to effectively capture the nonlinearity present in the data set, which leads to entire sets of faults going undetected for both the linear PCA and PLS techniques. However, as demonstrated in Figures 9(b) and 10(b), the KPCA and KPLS-based Q techniques are better able to detect the faults with lower missed detection rates, false alarm rates, and ARL 1 values than the linear PCA and PLS methods (as shown in Table 2). These improved results can be attributed to the fact that the kernel techniques are able to capture the nonlinearity

Missed detection (%)
False alarm (%) ARL 1    in the hyperdimensional feature space, providing better detection especially in this case where there are multiple faults in the system.

Simulated CSTR model
In order to effectively assess the performance of the kernel PCA and kernel PLS techniques, it is also necessary to examine the performance of the techniques using an actual process application as well. A continuous stirred tank reactor model can be used to generate nonlinear data, and the fault detection charts can be applied to test their performance.

CSTR process description
The dynamic for the CSTR that was utilized for this simulated example is represented as follows [5]: where k 0 , E, F, and V represent the reaction rate constant, activation energy, flow rates (both inlet and outlet), and reactor volume, respectively. The concentration of A in the inlet stream and of B in the exit stream is represented by C A and C B , respectively. The temperatures of the inlet stream and of the cooling fluid in the jacket are T i and T j , respectively. ΔH, U, A, ρ, and C p represent the heat of reaction, overall heat transfer coefficient, area through which the heat transfers to the cooling jacket, density, and heat transfer coefficient of all streams, respectively.
Using the described CSTR model, 1000 observations were generated, which was assumed to be initially noise-free. Zero-mean Gaussian noise with a signal-to-noise ratio of 20 was used to contaminate the noise-free process observations, in order to replicate reality. Figure 11 shows Figure 11. Generated continuously stirred tank reactor (CSTR) data. the generated CSTR data. This data set was then split into training and testing data sets, of 500 observations each. Faults of magnitude 3σ were added to the temperature and concentration process variables in the testing data set, at three different locations: observations 101-150, 251-350, and 401-450. σ is the standard deviation of that particular process variables. Figures 12 and 13 show the unfaulty and faulty data, respectively. Similar to the previous example, the performance of kernel PCA and kernel PLS methods is compared to the conventional linear PCA and PLS methods using the Q statistic.
For this example, comparing the two conventional techniques, we can see that the PCA-based Q statistic is unable to all faults (see Figure 14 (a)), while the PLS-based Q model is able to better detect the faults (see Figure 15 (a)). However, the kernel PCA and kernel PLS-based Q techniques are able to provide result charts with lower missed detection rates, false alarm rates, and ARL 1 values than their corresponding conventional techniques (see Figures 14(b)  and 15(b)). These improved results can once again be attributed to the kernel techniques being able to effectively capture the nonlinearity of the data in the hyperdimensional feature space. The FD results using the two examples showed that the kernel PLS-based Q provides a relative performance compared to the kernel PCA Q. This is because kernel PCA is an input space model and cannot take into consideration outcome measures and most chemical processes or many of them are usually described by input-output space models.

Conclusion
In this chapter, a nonlinear multivariate statistical techniques are used for fault detection. Kernel PCA and kernel PLS have been widely used to monitor various nonlinear processes, such as distillation columns and reactors. Thus, in the current work, both kernel PCA and kernel PLS methods are used for nonlinear fault detection of chemical process. A commonly used fault detection index is Q-square statistic, and it is used to detect fault in the system. The fault detection performance using linear and nonlinear input models (PCA and kernel PCA) and input-output models (PLS and kernel PLS) is evaluated through two simulated examples, synthetic data set and continuous stirred tank reactor (CSTR). Missed detection rate, false alarm rate, and ARL 1 are the parameters used to compare the fault detection techniques. The results of the two case studies showed that the kernel PCA and kernel PLS-based Q provide improved fault detection performance compared to the conventional PCA-and PLS-based Q methods.

Acknowledgements
This work was made possible by NPRP grant NPRP7-1172-2-439 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.