1. Introduction
Process monitoring is an essential aspect of nearly all industrial processes, often required both to ensure safe operation and to maintain product quality. Process monitoring is generally carried out in two phases: detection and diagnosis. This chapter focuses only on the fault detection aspect. Fault detection methods can be categorized using a number of different methodologies. One popular method of categorization is into quantitative modelbased methods, qualitative modelbased methods, and data (process history)based methods [1–3]. Figure 1 illustrates a general schematic of fault detection phase.
Quantitative modelbased methods require knowledge of the process model, while qualitative modelbased methods require expert knowledge of the given process. Hence, databased methods are often used as they require neither prior knowledge of the process model nor expert knowledge of the process [4].
Databased monitoring methods can be further classified into input modelbased methods and inputoutput modelbased methods. Input modelbased methods only require the data matrix of the input process variables, while inputoutput modelbased methods require both the input and output data matrices in order to formulate a model and carry out fault detection [5]. Input modelbased methods are sometimes utilized when the inputoutput models cannot be formed due to the high dimensionality and complexity of a system being monitored [6]. However, inputoutput modelbased methods do have the added advantage of being able to detect faults in both the process and the variables [5].
Principal component analysis (PCA) is a widely used input modelbased method that has been used for monitoring a number of processes including air quality [7], water treatment [8], and semiconductor manufacturing [9]. On the other hand, partial least squares (PLS) are an inputoutput modelbased method that has been applied in chemical processes to monitor online measurement variables and also to monitor and predict the output quality variable [10]. PLS has been applied for the monitoring of distillation columns, batch reactors [11], continuous polymerization processes [12], and other similar industrial processes, which are described by inputoutput models. However, both PCA and PLS are fault detection techniques that only work reasonably well with linear data. PCA and PLS have been extended to handle nonlinear data by utilizing kernels to transform the data to a higher dimensional space, where linear relationships between variables can be drawn. The extensions kernel principal component analysis (KPCA) and kernel partial least squares (KPLS) have both shown improved performance over the conventional PCA and PLS techniques when handling nonlinear data [5, 13].
In our previous works [5, 13, 15], we addressed the problem of fault detection using linear and nonlinear input models (PCA and kernel PCA) and inputoutput model (PLS and kernel PLS)based generalized likelihood ratio test (GLRT), in which PCA, kernel PCA, PLS, and kernel PLS methods are used for modeling and the univariate GLRT chart is used for fault detection. In the current work, we propose to use the PCA, kernel PCA, PLS, and kernel PLS methods for multivariate fault detection through their multivariate charts Q and
The remainder of this chapter is organized as follows. Section 1 introduces linear PCA and PLS, along with the fault detection indices used for these methods. Section 2 then describes the idea of using kernels for nonlinear transformation of data, along with the kernel fault detection extensions: KPCA and KPLS. In Section 3, two illustrative examples are presented, one using simulated synthetic data and the other utilizing a simulated continuous stirred tank reactor. At the end, the conclusions are presented in Section 4.
2. Conventional linear fault detection methods
Before constructing either the PCA or PLS models, data are generally preprocessed to ensure that all process variables in the data matrix are scaled to zero mean and unit variance. This step is essential as different process variables are usually measured with varying standard deviations and means and often using different units.
2.1. Principal component analysis (PCA)
Consider the following input data matrix, X ∈ R^{n × m}, where m and n represent the number of process variables and the number of observations, respectively. After preprocessing the data, single value decomposition (SVD) can be utilized to express the input data matrix as follows:
where
where
Therefore, selection of the number of principal components is vital, and several methods have been developed for this purpose. A few popular techniques are cumulative percent variance (CPV) [13], scree plot and profile likelihood [18], and cross validation [19]. CPV is commonly utilized due to its computational simplicity and because it provides a good estimate of the number of principal components that need to be retained for most practical applications. CPV can be computed as follows [13]:
CPV is used to select the smallest number of principal components that represents a certain percentage of the total variance (e.g., 99%). Once the number of principal components to retain is determined, the input data matrix can then be expressed as [13]:
where
After expansion X can be expressed as [13]
where matrix
The PCA model can be illustrated as shown in Figure 2.
2.2. Partial least squares (PLS)
PLS is a popular inputoutput technique used for modeling, regression and as a classification tool, which has been extended to fault detection purpose [20]. PLS includes process variables (X ∈ R^{n × m}) and the quality variables (Y ∈ R^{n × p}) with a linear relationship between input and output score vectors. Nonlinear iterative partial least square (NIPALS) algorithm developed by Word et al. is used to compute score matrices and loading vectors [21]:
where E ∈ R^{n × m} and F ∈ R^{n × p} are the PLS model residues; T ∈ R^{n × M} and U ∈ R^{n × M} are the orthonormal input and output score matrix, respectively; P and Q are the loading vectors of the input (X) and output (Y) matrices, respectively; m and n are the number of process variables and observations in input (X) matrix; p is the number of quality variables in output (Y) matrix; and M is the total number of latent variables extracted. NIPALS method is shown in Algorithm 1; X and Y matrixes are first standardized by mean centering and unit variance. NIPALS algorithm is initialized by assigning one of the columns of output matrix (Y) as output score vector (u); at each iteration t, u, p, and q are computed and stored; M latent variables are extracted.
Another modification to NIPALS algorithm has been published in the literature [22]. In other work, different variations of PLS technique have been stated. Qin et al. have presented recursive PLS model [23], where PLS model is updated with new training data set; MacGregor et al. [24] have developed multiblock PLS model to monitor subsection of process variables. While for process monitoring of batch processes, PLS has been extended to multiway PLS technique [25] to incorporate past batches in training data set.
PLS being an inputoutput type model can also be used as a regression tool, to predict quality variable (Y) from online measurement variable (X). From PLS, model input and output matrices are related by
The regression coefficient B is computed as shown in Eq. (8):
Substituting weights, loading vector P and constant C from Algorithm 1:
From Eqs. (7) and (9), the output matrix (Y) is predicted as
2.3. Fault detection indices
Different fault detection indices can be used for the linear PCA and PLS techniques. The two most popular indices are the T^{2} and Q statistics. T^{2} measures the variation of the model, while the Q statistic measures the variation in the residual space, and these statistics will be described next.
2.3.1. T^{2} statistic
The T^{2} statistic measures the variation in the principal components at different time samples and is defined as follows [26, 27]:
where
where α is the level of significance, generally assigned a value between 90 and 99%, and F(a, n − a) is the critical value of the FisherSnedecor distribution with n and n−a degrees of freedom.
2.3.2. Q statistic
The Q statistic measures the projection of the data on to the residual subspace and allows the user to measure how well the data fit the PCA model. The Q statistic is defined as follows [16]:
For testing data, a fault is declared when the threshold value is violated as follows [16]:
where
3. Nonlinear fault detection methods using kernel transformations
A popular nonlinear version of PCA and PLS is the projection of nonlinear data to a highdimensional feature space, where the linear fault detection method is applied in the features space, F. The authors in Ref. [28] used projection of X for PLS response surface modeling using the quadratic function as the mapping function:
However, it is difficult to know the accurate nonlinear transformation function for nonlinear data matrix to be linear in the feature space. According to Mercer’s theorem, orthogonal semipositive definite function can be used to map the data into the feature space instead of knowing the explicit nonlinear function. This nonlinear function is called the kernel function and is defined as the dot product of the mapped data in the feature space:
Thus, kernelbased multivariate methods can be defined as nonlinear fault detection methods in which the input data matrix is mapped into highdimensional feature space and developed linear models can be applied in the feature space for fault detection purposes.
Commonly used kernel functions are given below [29]:
Radial basis function (RBF):
Polynomial function:
Sigmoid function:
The next section describes the methodology of utilizing kernel transformations to extend linear PCA and PLS to the hyperdimensional space in order to carry out fault detection of nonlinear data.
3.1. Kernel principal component analysis (KPCA)
While PCA seeks to find the principal components by minimizing the data information loss in the input space, KPCA does this in the feature space (F). For KPCA learning using training data,
The covariance in the feature space can be computed as follows [31]:
Similar to PCA, the principal components in the feature space can be found by diagonalizing the covariance matrix. In order to diagonalize the covariance matrix, it would be necessary to solve the following eigenvalue problem in the feature space [31]:
where
In order to solve the eigenvalue problem, the following equation is derived [32]:
where K and α are the n × n kernel matrix and eigenvectors, respectively.
For test vector X, the principal components (t) are extracted projecting Φ(X) onto the eigenvectors v_{k} in the feature space where k = 1,…,l:
It is important to note that before carrying out KPCA, it is necessary to mean center the data in the highdimensional space. This can be accomplished by replacing the kernel matrix K with the following [32]:
where
3.1.1. T^{2} statistic for KPCA
Variation in the KPCA model can be found using T^{2} statistic, which is the sum of normalized squared scores, computed as follows [31]:
where t_{k} is obtained from Eq. (33).
The confidence limit is computed as follows [31]:
3.1.2. Q statistic for KPCA
In order to compute the Q statistic, the feature vector Φ(X) needs to be reconstructed. This is done by projecting t_{k} into the feature space using v_{k} as follows [31]:
The Q statistic in the feature space can now be computed as [31]
The confidence limit of the Q statistic can then be computed using the following equation [31]:
This limit is based on Box’s equation, obtained by fitting the reference distribution obtained using training data, to a weighted distribution. Parameter g is the weight assigned to account for the magnitude of the Q statistic, and h represents the degree of freedom. Considering a and b the estimated mean and variance of the Q statistic, g and h are approximated using g = b/2a and h = 2a^{2}/b.
3.2. Kernel partial least square (KPLS)
The KPLS methodology works by mapping the data matrices into the feature space and then applying the nonlinear partial least square algorithm and computing the loading and score vectors.
The mapped data points are given as
The kernel gram function can be used to map the data into the feature space instead explicitly using the nonlinear mapping function; this is called the kernel trick. Kernel gram function is defined as the dot product of the mapping function:
As with the KPCA algorithm, the kernel matrix has to be mean centered before applying the NIPALS algorithm using Eq. (34).
The input score matrix and weights are computed as
Thus, the score matrix is given as [33]
where
Now, the relationship between the input and output score matrices can be derived by combing Eqs. (15), (17), and (18):
In the feature space, replace X by its image Φ:
Substituting the kernel gram function, K = ΦΦ^{T}, input and output scores are given by
After every iteration, input kernel (K) and output matrix (Y) are deflated as
ΦΦ^{T} dot product is replaced by kernel gram function K:
Let
KPLS algorithm can also be used to predict output matrix Y from input matrix X as
Φ_{t} is mapped testing data in feature space from
Thus combining Eqs. (50) and (51), we get predicted output quality matrix:
Algorithm 2: Kernel partial least square (KPLS) algorithm 


3.2.1. T^{2} statistic for KPLS
The T^{2} statistic for KPLS can be computed as
where
The threshold value for T^{2} statistic is computed using the finverse distribution and is given by [35]
where n and m are the total number of observations and variables in the input data matrix X, respectively, and
3.2.2. Q statistic for KPLS
As with the other databased models, the Q statistic computes the mean square error of the residue from the KPLS model:
Substituting the kernel gram functions as the dot product of mapped points:
where
The threshold value for the Q statistic under the significance level of α [36] is given by
where g and h are given by
A fault is declared in the system if the Q statistic value is higher than threshold value (Q_{α}) for new data set.
The following section demonstrates the implementation of the fault detection methods described above and analyzes the effectiveness of all techniques.
4. Illustrative examples
The effectiveness of the kernel extensions of PCA and PLS for fault detection purposes will be demonstrated through two illustrative nonlinear examples, using a simulated synthetic data set and a simulated continuous stirred tank reactor (CSTR).
4.1. Simulated synthetic data
Synthetic nonlinear data can be simulated through the following model [37]:
where u is a variable that is defined between −1 and 1 and ε_{i} is a variable of independent white noise distributed uniformly between −0.1 and 0.1. Training and testing data sets of 401 observations each are generated using the model above. The performance of KPCA and KPLS techniques is illustrated and compared to the conventional PCA and PLS methods for two different cases. In the first case, the sensor measuring the first variable
Figure 3 shows the generated data.
Case 1
In this case, a single fault of magnitude unity is introduced between observations 200 and 250 in
The fault detection (FD) performance of PCA, KPCA, PLS, and KPLSbased Q methods is shown in Figures 6 and 7 as well as Table 1. The results show that both KPCA and KPLSbased Q provide a better FD performance than the linear PCA and PLSbased Q methods and are able to detect the faults with lower missed detection rates, false alarm rates, and ARL_{1} values (see Table 1).
Missed detection (%)  False alarm (%)  ARL_{1}  

PLSbased Q statistic  90.1961  13.1429  36 
KPLSbased Q statistic  3.9216  0  2 
PCAbased Q statistic  100  7.4286   
KPCAbased Q statistic  27.4510  5.1429  1 
Case 2
In this case, a multiple faults of magnitude unity are introduced between observations 200 and 250 in
The FD performance of the kernel PCA and kernel PLS methods is illustrated and compared to that of the conventional PCA and PLS methods using the Q statistic. The Q statistic was chosen for analysis, since it is often better able to detect smaller faults using the residual space and for simplicity of analysis as well. The fault detection performance of a particular process monitoring technique can be monitored using multiple fault detection metrics.
As can be seen through Figures 9(a) and 10(a) and Table 2, the conventional linear PCA and PLS techniques are unable to effectively capture the nonlinearity present in the data set, which leads to entire sets of faults going undetected for both the linear PCA and PLS techniques. However, as demonstrated in Figures 9(b) and 10(b), the KPCA and KPLSbased Q techniques are better able to detect the faults with lower missed detection rates, false alarm rates, and ARL_{1} values than the linear PCA and PLS methods (as shown in Table 2). These improved results can be attributed to the fact that the kernel techniques are able to capture the nonlinearity in the hyperdimensional feature space, providing better detection especially in this case where there are multiple faults in the system.
4.2. Simulated CSTR model
In order to effectively assess the performance of the kernel PCA and kernel PLS techniques, it is also necessary to examine the performance of the techniques using an actual process application as well. A continuous stirred tank reactor model can be used to generate nonlinear data, and the fault detection charts can be applied to test their performance.
4.2.1. CSTR process description
The dynamic for the CSTR that was utilized for this simulated example is represented as follows [5]:
where k_{0}, E, F, and V represent the reaction rate constant, activation energy, flow rates (both inlet and outlet), and reactor volume, respectively. The concentration of A in the inlet stream and of B in the exit stream is represented by C_{A} and C_{B}, respectively. The temperatures of the inlet stream and of the cooling fluid in the jacket are T_{i} and T_{j}, respectively. ΔH, U, A, ρ, and C_{p} represent the heat of reaction, overall heat transfer coefficient, area through which the heat transfers to the cooling jacket, density, and heat transfer coefficient of all streams, respectively.
Using the described CSTR model, 1000 observations were generated, which was assumed to be initially noisefree. Zeromean Gaussian noise with a signaltonoise ratio of 20 was used to contaminate the noisefree process observations, in order to replicate reality. Figure 11 shows the generated CSTR data. This data set was then split into training and testing data sets, of 500 observations each. Faults of magnitude 3σ were added to the temperature and concentration process variables in the testing data set, at three different locations: observations 101–150, 251–350, and 401–450. σ is the standard deviation of that particular process variables. Figures 12 and 13 show the unfaulty and faulty data, respectively. Similar to the previous example, the performance of kernel PCA and kernel PLS methods is compared to the conventional linear PCA and PLS methods using the Q statistic.
For this example, comparing the two conventional techniques, we can see that the PCAbased Q statistic is unable to all faults (see Figure 14 (a)), while the PLSbased Q model is able to better detect the faults (see Figure 15 (a)). However, the kernel PCA and kernel PLSbased Q techniques are able to provide result charts with lower missed detection rates, false alarm rates, and ARL_{1} values than their corresponding conventional techniques (see Figures 14(b) and 15(b)). These improved results can once again be attributed to the kernel techniques being able to effectively capture the nonlinearity of the data in the hyperdimensional feature space. The FD results using the two examples showed that the kernel PLSbased Q provides a relative performance compared to the kernel PCA Q. This is because kernel PCA is an input space model and cannot take into consideration outcome measures and most chemical processes or many of them are usually described by inputoutput space models.
5. Conclusion
In this chapter, a nonlinear multivariate statistical techniques are used for fault detection. Kernel PCA and kernel PLS have been widely used to monitor various nonlinear processes, such as distillation columns and reactors. Thus, in the current work, both kernel PCA and kernel PLS methods are used for nonlinear fault detection of chemical process. A commonly used fault detection index is Qsquare statistic, and it is used to detect fault in the system. The fault detection performance using linear and nonlinear input models (PCA and kernel PCA) and inputoutput models (PLS and kernel PLS) is evaluated through two simulated examples, synthetic data set and continuous stirred tank reactor (CSTR). Missed detection rate, false alarm rate, and ARL_{1} are the parameters used to compare the fault detection techniques. The results of the two case studies showed that the kernel PCA and kernel PLSbased Q provide improved fault detection performance compared to the conventional PCA and PLSbased Q methods.