Summary of fault detection results (case 1).
Principal component analysis (PCA) is a linear data analysis technique widely used for fault detection and isolation, data modeling, and noise filtration. PCA may be combined with statistical hypothesis testing methods, such as the generalized likelihood ratio (GLR) technique in order to detect faults. GLR functions by using the concept of maximum likelihood estimation (MLE) in order to maximize the detection rate for a fixed false alarm rate. The benchmark Tennessee Eastman Process (TEP) is used to examine the performance of the different techniques, and the results show that for processes that experience both shifts in the mean and/or variance, the best performance is achieved by independently monitoring the mean and variance using two separate GLR charts, rather than simultaneously monitoring them using a single chart. Moreover, single-valued data can be aggregated into interval form in order to provide a more robust model with improved fault detection performance using PCA and GLR. The TEP example is used once more in order to demonstrate the effectiveness of using of interval-valued data over single-valued data.
- principal component analysis
- generalized likelihood ratio
- hypothesis testing
- fault detection
- Tennessee Eastman Process
- interval data
Current technological advancements allow data to be collected from a number of different sources. The availability of abundant data collected from different sensors is beneficial, as they can be utilized in order to observe trends between and within different measured process variables. This allows process models to be developed in order to help identify if different processes or applications are behaving as expected . Additionally, with industrial growth present in many developing countries, efficient process monitoring is essential for newer and more complex processes. Monitoring of these processes is required in order to ensure process safety, maintain product quality, increase economic benefits, and also to ensure that the process adheres to strict environmental regulation standards .
Statistical process monitoring methods can be classified into three broad categories: quantitative model based methods, qualitative model based methods, and process history based methods [3, 4, 5]. Quantitative model based methods require detailed knowledge of a process in order to construct a model that can be used for monitoring, for example, Kalman filters , while qualitative model based methods require the presence of process engineering experts in order to develop monitoring procedures or tasks, for example, fault trees . In the absence of these two requirements, and due to the complexity of many processes that require monitoring, data-based techniques are often commonly used by the industry for various applications from drug design, to drinking water treatment [5, 6, 7].
Principal component analysis (PCA) is a powerful, linear data analysis technique widely used in research and industrial applications , for fault detection and isolation, data modeling and reconstruction, feature extraction, and noise filtration. PCA is useful for the extraction of dominant underlying information from a dataset, without any previous knowledge of the model. An example of the practical application of PCA has been discussed in , where data gathered from parallel sensors are used to quantify the quality of a given food sample. PCA is used to reduce the dimensionality of a dataset, whilst filtering out variability caused by noise . The PCA model has been utilized in order to monitor a wide variety of processes, and has seen many extensions [10, 11, 12, 13]. Two main fault detection statistics are typically utilized with a PCA model: Hotelling’s T2 statistic, and the Q statistic . Variations captured by the principal component space are monitored using the T2 statistic, while variations in the residual space are monitored using the Q statistic .
On the other hand, statistical hypothesis testing methods function by using statistical techniques in order to determine if observations collected from a given process follow the null hypothesis, that is, operating under normal operating conditions, or alternate hypothesis, that is, operating under abhorrent or faulty operating conditions . These faults can be of different types, such as shifts in the mean, variance, or both. The generalized likelihood ratio (GLR) technique has received a lot of attention in process monitoring literature [10, 11, 13, 16]. The GLR method aims to maximize the detection rate for a fixed false alarm rate . Therefore, an objective of this work is to provide a comparative review of the different GLR charts by utilizing examples such as the benchmark Tennessee Eastman Process (TEP) .
Data utilized in the construction of a PCA model may be of two types depending on the application being monitored: single-valued, and interval-valued. Single-valued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant. The use of interval data in fault detection was originally introduced in order to reduce large datasets to a more manageable size , without compromising the integrity of the dataset. In addition, the use of interval data is beneficial because of its inherent ability to deal with missing values in samples, which may happen due to malfunctioning sensors or varying sampling frequencies between variables .
However, in cases where reducing the dataset may not be a viable option, due to a relatively limited sample size or sampling frequency, the use of interval data can be applied using a moving window aggregation method. This is also true of applications where batch process monitoring is not a viable option, thereby necessitating the need for real-time online monitoring of samples. The benchmark TEP example will be used once more in order to analyze the benefit of using moving window interval aggregation on the fault detection performance of PCA and GLR.
The rest of this chapter will be organized as follows. In Section 2, a more detailed introduction to PCA is provided along with a quick overview of the fault detection statistics used to examine the fault detection performance of the methods discussed in this paper. Section 3 will introduce hypothesis testing methods and the different GLR charts. In Section 4, the moving window interval aggregation method is explained, as well as its integration with PCA and GLR for the purposes of fault detection. Section 5 then presents illustrative examples using simulated synthetic data and TEP using a PCA-based GLR technique, used to demonstrate the effect that using GLR and interval data has on the fault detection performance. Conclusions are then presented in Section 6.
2. Principal component analysis (PCA)
Principal component analysis (PCA) is a linear dimensionality reduction tool used to reduce the number of variables in a dataset, whilst retaining most of the data’s variability. PCA finds a new set of variables, called principal components, using a linear combination of the dataset’s original cross-correlated variables . The algorithm for PCA is summarized below.
2.1 PCA algorithm
Given a classical training dataset , where is the number of sample rows and is the number of variable columns, the PCA model is found as follows:
Find the correlation matrix of .
Find the column eigenvectors matrix and the diagonal eigenvalues matrix of . Each eigenvector defines the linear combination coefficients used to find the principal components from the original variables, and each eigenvalue represents the amount of variance that its respective principal component covers in the dataset.
Retain principal components that cover the minimum desired variability in the dataset, denoted as .
Find the predictive transformation matrix, .
Find the residual transformation matrix, .
is used to find the projection of the dataset onto the PCA model, and is used to find the amount of deviation of the dataset from its projection onto the PCA model, also known as the matrix of residuals. For more comprehensive details, please refer to [9, 19, 20].
The training dataset defines the system under normal or optimal operating conditions, where there are no faults and the noise is minimal. Consequently, is used to find the PCA model, defined using and transformation matrices. The testing dataset defines the system under unknown operating conditions, and it is monitored for faults using its respective residuals , as will be discussed later.
2.2 Fault detection statistics
Knowing the optimal number of eigenvectors or principal components to retain, fault detection is then carried out by evaluating the PCA model’s residuals using any detection statistic. This section will focus on briefly introducing the two most well-known statistics in literature: The Q and T2 statistics.
The Q-statistics of a classical residual matrix is defined as :
is used to find the Q-threshold value , which defines the maximum possible value for a testing data’s Q-statistic, denoted as , beyond which the sample will be declared as a fault [14, 19, 21]. The threshold is calculated using the empirical cumulative distribution function (CDF) of , which is an estimate of the true CDF of its discrete values.
The fault detection performance is tabulated by comparing with . If , then the th sample is declared as faulty, otherwise it is normal. There are two metrics used for benchmarking each method: false alarm rate (FAR) and detection rate (DR).
FAR is the average percentage of samples that were wrongfully declared as faults. The detection rate is the average percentage of samples that were rightfully declared as faults. It is desirable to maximize DR, for a fixed FAR, in order to have a better fault detector.
Alternatively, the Hotelling T2 statistic, which measures variations in the principal component space can be used, is computed as follows :
where, , is a diagonal matrix that contains the eigenvalues that are associated with the retained principal components The threshold for the T2 statistic can be computed either computational or empirically . The Q statistic is often utilized by authors instead of the T2 statistic as it better able to detect smaller faults [10, 11].
3. Hypothesis testing methods
Hypothesis testing methods such as the generalized likelihood ratio (GLR), have received a lot of attention in recent literature [10, 13, 23]. Hypothesis testing methods utilize fundamental statistical theory in order to determine if given data conforms to a targeted distribution, that is, a null hypothesis, or deviates from this distribution, and follows an alternative distribution, that is, an alternate hypothesis . In process monitoring terms, the parameters of the null and alternate hypotheses are defined using data from normal and abhorrent operating conditions, respectively .
3.1 Generalized likelihood ratio
The generalized likelihood ratio (GLR) technique defines the alternate hypotheses by parameters that can assume an infinite number of values, and is therefore called a composite hypothesis. An efficient point estimation method that utilizes the concept of maximum likelihood estimates (MLEs) is employed in order to estimate the required parameters.
The univariate GLR chart uses the concept of maximum likelihood estimates in order to maximize the detection rate for a fixed false alarm rate. The GLR process is accomplished through the following steps :
The null and alternate hypotheses are defined, and their respective likelihood functions are derived.
Any unknown parameters in the alternate hypothesis are computed from the testing data using their MLEs, for example, the mean and/or variance.
The log likelihood ratio of the alternate to null hypotheses is then computed, and its maximum value is calculated, which maximizes the detection rate.
Univariate GLR charts can be designed based on the type of the fault that needs to be detected. Most processes experience shifts in the mean, and/or shifts in the variance, and three of these GLR charts will be explained next.
For the case when residuals are collected from processes under normal operating conditions, the likelihood function derived from a random normal distribution can be defined as follows :
where and mean and variance of the process variable measured under normal operating conditions respectively.
3.1.1 Univariate GLR chart for a shift in the mean
If a shift in the mean has occurred at time , from to , the likelihood function of the alternate hypothesis is defined as follows :
Since the magnitude of the new mean is unknown, its MLE can be computed using testing data as follow :
The authors in  state that it is not necessary to store the entire length of previous historical data in order to compute the MLEs, but a window length of about 400 is sufficient to provide reliable results. Therefore, a window length of 400 was utilized throughout this work for all GLR charts.
3.1.2 Univariate GLR chart for a shift in the variance
If only a shift in the variance has occurred from at time , from to , the alternate hypothesis for this case is defined as follows :
From a quality control standpoint we are only concerned with increases in variance, as larger variations imply that product is being manufactured with quality further away from the targeted amount, and since the magnitude of the new variance is unknown, its MLE can be computed using testing data as follows :
3.1.3 Univariate GLR chart for a shift in the mean and/or variance
Since it is possible for most processes to experience both shifts in the mean and variance, a GLR statistic that is capable of detecting either type of shift can be designed. The likelihood function of the alternate hypothesis for this case is defined as follows :
As previously stated, from a quality control standpoint only an increase in the variance is of concern, and the MLE for the variance can be computed as follows :
If there are no shifts in the mean for testing data, the variance is computed as follows :
In this case, the GLR statistic designed to simultaneously monitor both shifts in the mean and variance, and can be computed by taking the log-likelihood ratio of (Eqs. (3) and (10)) resulting in the following equation :
It is important to note that for this particular GLR method, two parameters, that is, the mean and the variance have to be estimated using their MLE, since the type of shift is unknown.
3.1.4 Multivariate GLR chart for a shift in the mean
Since using a univariate GLR chart may not always be practical, Wang and Reynolds  introduce the multivariate GLR chart, designed to specifically monitor shifts in the process mean for multivariate applications. In this case, the GLR statistic is defined as follows:
Where is the multivariate mean vector of the process under normal operating conditions, is the MLE of a sustained process mean shift at time index over sample window of maximum length , and is the process covariance matrix under normal conditions .
3.2 Fault detection using PCA-based GLR
The PCA method introduced in Section 2 is commonly utilized by many industries. Therefore, it is necessary to integrate the simplicity of the PCA method with the advantages brought forward by the GLR charts, so that it can be easily applied to monitor processes online. Figure 1 illustrates the fault detection algorithm utilized in this work.
PCA is utilized in order to model available data. The different GLR charts can then be applied on the residuals produced by the PCA model in order to determine if the process is operating under normal or faulty conditions. The fault detection threshold limits are obtained from an empirical distribution of the GLR statistic computed under normal operating conditions. The residual space is typically better able at detecting faults of smaller magnitude .
4. Moving window interval data aggregation
Data utilized in the construction of a PCA model may be of two types depending on the application being monitored: single-valued, and interval-valued. Single-valued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant .
An interval is defined using a lower and upper bound, such as [
Initially, the use of interval data is motivated by the need to quickly and efficiently monitor large datasets , in addition to its ability to deal with missing values without the need to remove entire samples. Generating intervals by aggregation is a form of batch processing, which may not always be ideal. The ability to monitor faults in real-time is typically much more desirable from a quality and safety standpoint. It also becomes impractical to use batch aggregation when discussing processes with a low sample size or low sampling frequency.
As a result, interval data aggregation must be adapted for real-time monitoring purposes. One way to do that would be to use a moving window aggregation technique, such that any observed sample is aggregated with previously gathered samples, if any, in the defined window size. This allows for the generation and processing of interval data in real-time, without the need to wait for multiple samples to be observed before processing.
As expected, however, this method suffers from some drawbacks relative to its batch aggregation counterpart. The moving window approach may cause smearing along the detection statistic, leading to higher false alarms and lower detection rates. This is especially true for large window sizes, as is the case for most methods which apply that approach. The problem can be mitigated by limiting the window size to reasonable limits, whilst also adjusting the threshold in order to meet the desired false alarm rates of the process.
4.1 Integration with PCA-based GLR
Interval principal component analysis (IPCA) methods are an extension to the classical PCA method, and they have been explored in literature for fault detection and isolation examples [29, 30]. In this work, three IPCA methods will be briefly introduced, before discussing our proposed method of integrating the moving window interval approach to the PCA-based GLR technique.
Centers IPCA (CIPCA) was introduced by Cazes et al. , where the idea was to only apply PCA to the matrix of interval centers. This method focuses on the variation between the intervals of a dataset, rather than the variations within them [18, 32]. Midpoint-Radii IPCA (MRIPCA) was developed by Lauro et al. [33, 34, 35, 36], where PCA models are separately generated for the centers and radii matrices of the interval training dataset. Finally, the Symbolic Covariance IPCA (SCIPCA) method was introduced by Le-Rademacher et al. [18, 32] as a way to better represent the range and variability found in interval data.
In this paper, the integration of the moving window aggregation to PCA-based GLR will be as follows. After generating an interval sample for each single-valued sample, the single-valued matrices of interval centers and radii are extracted. The matrices are then concatenated along the variables dimension, so as to maintain the number of samples, but double the number of variables. This is similar to the MRIPCA method, except it avoids the need to apply PCA twice, eliminating any additional processing complexity.
5. Illustrative examples
This section evaluates the performance of the three PCA-based GLR charts described in Section 3, and the moving window aggregation method discussed in Section 4. The PCA-based GLR charts are evaluated under different fault scenarios, and this is done through two illustrative examples: a simulated synthetic data set, and the benchmark Tennessee Eastman Process (TEP). Three fault detection metrics are used to evaluate the performance of each univariate chart: missed DR (which is equal to 100-DR), FAR, and average out-of-control run length (ARL1). Finally, the moving window interval aggregation method, in tandem with the PCA-based multivariate GLR chart, are analyzed using the benchmark TEP process, and the results are tabulated and compared to the single-valued multivariate GLR chart.
5.1 Simulated synthetic data example
The purpose of this example is to utilize a simple linear model to compare and evaluate the performance of the difference PCA-based univariate GLR charts. The linear data set can be generated using the following model :
where, , , and , are uniformly distributed random variables with ranges, , , and , respectively, while the noise follows a normal distribution with zero-mean and standard deviation of 0.2 .
The linear model is used to generate 6000 observations, split into training and testing data sets of 3000 observations each. The training data are used to train the PCA model, while the testing data are used to evaluate the performance of all techniques using three cases of faults: a shift in the mean, a shift in the variance, and a simultaneous shift in both.
Five charts are evaluated and compared: the PCA-based T2 and Q charts, and the three different PCA-based univariate GLR charts. The faulty region is highlighted in light blue for all figures, and the fault detection threshold limits for all charts are represented by the red dotted line. For each case a Monte-Carlo simulation of 1000 realizations is carried out in order to obtain meaningful results, so that conclusions can be drawn.
5.1.1 Case 1: a shift in the mean
For this case, a shift in the mean of was introduced between observations 1501 and 3000 in in the testing data set. This fault size was chosen as most conventional techniques are unable to detect a fault of this magnitude. Faults of higher magnitude would likely provide misleading results and exaggerate the robustness of the method in question, leading to a biased comparison.
As can be seen through Figure 2 , the T2 and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts ( Figure 3a and c ), are able to detect most of the fault, while the GLR chart designed to monitor a shift in the variance ( Figure 3b ) could not detect that a shift in the mean was present.
Examining the summary of the fault detection results ( Table 1 ), it can be observed that the GLR chart designed to monitor shifts in the mean ( Figure 3a ) provided the lowest missed DR and ARL1 values, compared to all other charts.
|PCA-based T2||PCA-based Q||PCA-based GLR (to monitor mean)||PCA-based GLR (to monitor variance)||PCA-based GLR (to monitor mean and/or variance)|
|Missed DR (%)||95.3||94.5||00.4||85.1||31.5|
The relatively high missed DR of the GLR chart designed to simultaneously monitor shifts in both the mean and variance ( Figure 3c ) can be attributed to the fact that two parameters need to be estimated from available data while maximizing the GLR statistic, thereby making it difficult to predict a shift in a single parameter as efficiently.
5.1.2 Case 2: a shift in the variance
For this case, an increase in the variance (double that of the training data) was introduced between observations 1501:3000 in in the testing data set. This shift in the variance is too small for detection by most conventional techniques.
As can be seen through Figure 4 , the T2 and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts ( Figure 5b and c ) were able to detect most of the fault, while the GLR chart designed to monitor a shift in the mean ( Figure 5a ) could not detect it as well. Examining the summary of the results ( Table 2 ), it can be observed that the GLR chart designed to monitor a shift in the variance ( Figure 5b ) provided the lowest missed DR and ARL1 values, compared to other charts.
|PCA-based T2||PCA-based Q||PCA-based GLR (to monitor mean)||PCA-based GLR (to monitor variance)||PCA-based GLR (to monitor mean and/or variance)|
|Missed DR (%)||90.2||88.6||47.5||00.7||33.0|
5.1.3 Case 3: a shift in the mean and/or variance
For this case, a simultaneous shift in the mean of and an increase in the variance (double that of the training data) was introduced between observations 1501:3000 in in the testing data set.
As can be seen through Figure 6 , the T2 and Q charts are unable to detect the entirety of the fault once more. Although it might seem that all three GLR charts ( Figure 7 ) are able to detect most of the fault, upon closer inspection of the results summarized in Table 3 , it can be observed that the GLR charts designed to independently detect a shift in the mean ( Figure 7a ), and variance ( Figure 7b ), are able to provide significantly lower missed DR and ARL1 values compared to the chart designed to monitors shifts in both ( Figure 7c ).
|PCA-based T2||PCA-based Q||PCA-based GLR (to monitor mean)||PCA-based GLR (to monitor variance)||PCA-based GLR (to monitor mean and/or variance)|
|Missed DR (%)||86.7||84.5||00.4||00.4||24.2|
The main conclusion from this example is that if a process is expected to experience shifts in both the mean and/or variance, it is more beneficial to run the PCA-based GLR charts designed to independently monitor shifts in the mean and variance as two parallel charts, rather than utilizing the GLR chart designed to simultaneously monitor both. Based on this conclusion, only the former two GLR charts will be utilized for the next example.
5.2 Tennessee Eastman Process (TEP)
In order to assess the feasibility of using two separate GLR charts to monitor shifts in the process mean and variance, their performance has to be evaluated using real data. Many authors utilize the Tennessee Eastman Process (TEP) in order to evaluate the performance of their techniques [17, 38, 39]. The Tennessee Eastman Process is a realistic simulation of an actual chemical process that consists of a reactor, condenser, stripper, compressor, and separator, and is widely accepted as a benchmark for fault detection .
The Tennessee Eastman Process contains a bank of pre-defined faults that can be utilized by authors in order to assess the performance of their developed fault detection algorithms. More information on the Tennessee Eastman Process, the process description, and the available bank of faults is available in literature [10, 17, 21, 38, 39].
Two fault scenarios will be examined in this work: IDV 3 and IDV 11 . IDV 3 is a shift in the mean of the temperature of Feed D, while IDV 11 is random variation in the reactor cooling water inlet temperature . These two fault scenarios were selected because the conventional techniques are unable to provide the best possible detection. For both scenarios, the fault is introduced after 800 observations of normal operation. The performance of four charts are evaluated: PCA-based T2 and Q charts, and the PCA-based univariate GLR charts designed to independently monitor shifts in the mean and variance. The faulty region is highlighted in light blue in all figures.
5.2.1 IDV 3: a step fault in the mean of the temperature of feed D
For the case where there is a shift in the mean of the temperature of Feed D, the PCA-based T2 and Q charts, and the PCA-based univariate GLR charts are illustrated in Figures 8 and 9 respectively, and the fault detection results are summarized in Table 4 .
|PCA-based T2||PCA-based Q||PCA-based GLR (to monitor mean)||PCA-based GLR (to monitor variance)|
|Missed DR (%)||97.6||92.8||07.9||70.9|
From Figure 8 it can be observed that the T2 and Q charts are unable to detect the entirety of the fault, while the GLR chart designed to monitor shifts in the mean ( Figure 9a ) is able to detect the most of the fault, and provides the lowest missed DR ( Table 4 ). Although, the T2 chart returns a low ARL1 value, it does not detect the fault efficiently, and the low ARL1 value can be attributed to random noise.
5.2.2 IDV 11: random variation in the reactor cooling water inlet temperature
For the case where there is random variation in the reactor cooling water inlet temperature, the T2 and Q charts, and the GLR charts are illustrated in Figures 10 and 11 respectively, and the fault detection results are summarized in Table 5 .
|PCA-based T2||PCA-based Q||PCA-based GLR (to monitor mean)||PCA-based GLR (to monitor variance)|
|Missed DR (%)||09.9||22.3||02.3||01.9|
Although it might seem like the T2 and Q charts ( Figure 10 ) are able to detect most of the fault, they still have higher missed DR than both GLR charts ( Figure 11 ). The GLR chart designed to monitor shifts in the variance provides the lowest missed DR from the charts that were compared.
From this example we can conclude that the PCA-based GLR charts are able to provide improved fault detection results over the conventional PCA-based T2 and Q charts. The improved results can be attributed to the use of MLEs to estimate the values of the unknown parameters used to maximize the GLR statistic, allowing for the best possible DR to be achieved for a fixed FAR. This example also demonstrates that the GLR charts can be easily designed and utilized to monitor chemical processes, such as the TEP.
5.2.3 IDV 3 and IDV 11: single-valued vs. interval-valued multivariate GLR chart
For the final case study, the moving window interval aggregation method is tested for the same fault scenarios tested previously for the TEP: IDV 3 and IDV 11. A smaller sample window size of 10 samples is used for the multivariate GLR chart in order to highlight the difference between using single and interval-valued data more clearly.
The interval aggregation window size was set at 10 samples. The IDV 3 and IDV 11 scenarios for both data types are shown in Figures 12 and 13 , and the metrics for each method are tabulated in Table 6 .
|IDV 3 Single-valued multivariate GLR||IDV 3 Interval-valued multivariate GLR||IDV 11 Single-valued multivariate GLR||IDV 11 Interval-valued multivariate GLR|
|Missed DR (%)||15.1||00.0||02.0||00.0|
There are two major observations to be made from the results. First, the use of the multivariate GLR chart allowed for a more stable FAR for all cases due to the presence of a single statistic to monitor for all variables, as opposed to the one for each variable when using the univariate GLR charts. Second, the missed DR when using interval data was significantly lower than that for single-valued data, reaching perfect performance levels of zero missed DR for both scenarios.
The latter observation is attributed to interval data, especially the method of generation, where the centers and radii are used as independent variables in the same dataset. This method of aggregation helps the PCA model account for shifts in the mean and variance respectively, similar to the univariate GLR chart outline in Section 3.1.3. However, it does so without the need to tune any extra parameters, due to the fact that a fault in the centers is likely to be caused by a shift in the mean, while a fault in the radii is likely to be caused by a shift in the variance.
In this chapter, the performance of GLR charts were compared to conventional fault detection statistics, specifically the Q and T2 statistics, and the integration of interval-valued data into real-time process monitoring was explored. The performance of different PCA-based univariate GLR charts were examined using single-valued data through two illustrative examples: simulated synthetic data, and the Tennessee Eastman Process. The performance of the moving window interval aggregation method was evaluated alongside that of single-valued data for the multivariate GLR chart as well.
The results demonstrate that in order to monitor processes that may experience both shifts in the mean and/or variance, the best performance is achieved by implementing the two respective univariate GLR charts separately in parallel, rather than the single chart designed to simultaneously detect shifts in both, as the simultaneous estimation of two parameters is unable to provide the best possible fault detection performance. Moreover, the moving window interval aggregation method, when combined with the multivariate GLR chart, was able to provide a perfectly stable statistic, with an unwavering false alarm rate, in addition to the best possible performance in detecting shifts in the mean and variance for two scenarios of the Tennessee Eastman Process.
This work was made possible by NPRP grant NPRP7-1172-2-439 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. The statements herein are solely the responsibility of the authors.
Montgomery DC. Introduction to Statistical Quality Control. 7th ed. Hoboken, NJ: John Wiley and Sons; 2013
Chakrabarty A, Mannan S, Cagin T. Multiscale Modeling for Process Safety Applications. 1st ed. Oxford, United Kingdom: Butterworth-Heinemann; 2015
Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Computers and Chemical Engineering. 2003; 27:293-311
Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Computers and Chemical Engineering. 2003; 27:313-326
Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part III: Process history based methods. Computers and Chemical Engineering. 2003; 27:327-346
George JP, Chen Z, Shaw P. Fault detection of drinking water treatment process using PCA and Hotelling’s T2 chart. International Journal of Computer and Information Engineering. 2009; 3:970-975
Sanguansat P, editor. Principal Component Analysis: Multidisciplinary Applications. Rijeka: InTech; 2009. DOI: 10.5772/2694
Sanguansat P, editor. Principal Component Analysis: Engineering Applications. Rijeka: InTech; 2012. DOI: 10.5772/2693
Joliffe IT. Principal Component Analysis. 2nd ed. New York, NY: Springer-Verlag; 2002
Sheriff MZ, Mansouri M, Karim MN, Nounou H, Nounou M. Fault detection using multiscale PCA-based moving window GLRT. Journal of Process Control. 2017; 54:47-64. DOI: 10.1016/j.jprocont.2017.03.004
Sheriff MZ, Botre C, Mansouri M, Nounou H, Nounou M, Karim MN. Process monitoring using data-based fault detection techniques: Comparative studies. In: Fault Diagnosis Detect. InTech; 2017. DOI: 10.5772/67347
Mansouri M, Sheriff MZ, Baklouti R, Nounou M, Nounou H, Ben Hamida A, et al. Statistical fault detection of chemical process: Comparative studies. Journal of Chemical Engineering and Process Technology. 2016; 07:1-10. DOI: 10.4172/2157-7048.1000282
Botre C, Mansouri M, Nounou M, Nounou H, Karim MN. Kernel PLS-based GLRT method for fault detection of chemical processes. Journal of Loss Prevention in the Process Industries. 2016; 43:212-224. DOI: 10.1016/j.jlp.2016.05.023
Tharrault Y, Mourot G, Ragot J. Fault detection and isolation with robust principal component analysis. In: 2008 16th Mediterr. Conf. Control Autom. Vol. 18. 2008. pp. 429-442. DOI: 10.1109/MED.2008.4602224
Montgomery DC, Runger GC. Applied Statistics and Probability for Engineers. 5th ed. Hoboken, NJ: John Wiley and Sons, Inc.; 2011
Harrou F, Nounou MN, Nounou HN. Detecting abnormal ozone levels using PCA-based GLR hypothesis testing. In: Proc. 2013 IEEE Symp. Comput. Intell. Data Mining, CIDM 2013–2013 IEEE Symp. Ser. Comput. Intell. SSCI 2013; 2013. pp. 95-102. DOI: 10.1109/CIDM.2013.6597223
Downs JJ, Vogel EF. A plant-wide industrial process control problem. Computers and Chemical Engineering. 1993; 17:245-255. DOI: 10.1016/0098-1354(93)80018-I
Le-Rademacher JG. Principal Component Analysis for Interval-Valued and Histogram-Valued Data and Likelihood Functions and some Maximum Likelihood Estimators for Symbolic Data. Athens, GA: University of Georgia; 2008
Basha N. Interval Principal Component Analysis and its Application to Fault Detection and Data Classification. College Station, TX: Texas A&M University; 2018
Strang G. Introduction to Linear Algebra. 5th ed. Wellesley, MA: Wellesley-Cambridge Press; 2016
Russell EL, Chiang LH, Braatz RD. Fault Detection and Diagnosis in Industrial Systems. New York, NY: Springer-Verlag; 2001
Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Education & Psychology. 1933; 24:417-441. DOI: 10.1037/h0071325
Harrou F, Nounou MN, Nounou HN, Madakyaru M. Statistical fault detection using PCA-based GLR hypothesis testing. Journal of Loss Prevention in the Process Industries. 2013; 26:129-139. DOI: 10.1016/j.jlp.2012.10.003
Reynolds MR, Lou JY. An evaluation of a GLR control chart for monitoring the process mean. Journal of Quality Technology. 2010; 42:287-310
Reynolds Jr MR, Lou J. A GLR control chart for monitoring the process variance. In: Lenz HJ, Schmid W, Wilrich, editors. Frontiers in Statistical Quality Control. New York, NY: Springer; 2012; 10:3-17. DOI: 10.1007/978-3-7908-2846-7
Reynolds MR, Lou J, Lee J, Wang S. The design of GLR control charts for monitoring the process mean and variance. Journal of Quality Technology. 2013; 45:34-60
Wang S, Reynolds MR. A GLR control chart for monitoring the mean vector of a multivariate normal process. Journal of Quality Technology. 2013; 45:18-33
Billard L, Le-Rademacher J. Principal component analysis for interval data. Wiley Interdisciplinary Reviews: Computational Statistics. 2012; 4:535-540. DOI: 10.1002/wics.1231
Benaicha A, Guerfel M, Bougila K, Benothman N. New PCA-based methodology for sensor fault detection and localization. In: 8th Int. Conf. Model. Simul.; Hammamet: Tunisia; 2010
Izem TA, Bougheloum W, Harkat MF, Djeghaba M. Fault detection and isolation using interval principal component analysis methods. IFAC-PapersOnLine. 2015; 48:1402-1407. DOI: 10.1016/j.ifacol.2015.09.721
Cazes P, Chouakria A, Diday E, Schektman Y. Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée. 1997; 45:5-24
Le-Rademacher J, Billard L. Symbolic covariance principal component analysis and visualization for interval-valued data. Journal of Computational and Graphical Statistics. 2012; 21:413-432. DOI: 10.1080/10618600.2012.679895
Lauro CN, Palumbo F. Principal component analysis for non-precise data. In: New Developments in Classification and Data Analysis. Berlin/Heidelberg: Springer-Verlag; n.d. pp. 173-184. DOI: 10.1007/3-540-27373-5_21
Lauro CN, Palumbo F. Principal component analysis of interval data: A symbolic data analysis approach. Computational Statistics. 2000; 15:73-87. DOI: 10.1007/s001800050038
Lauro NC, Verde R, Irpino A. Principal component analysis of symbolic data described by intervals. In: Symbolic Data Analysis and the SODAS Software. Chichester, UK: John Wiley and Sons, Ltd; n.d. pp. 279-311. DOI: 10.1002/9780470723562.ch15
Palumbo F, Lauro CN. A PCA for interval-valued data based on midpoints and radii. In: New Developments in Psychometrics. Japan, Tokyo: Springer; 2003. pp. 641-648. DOI: 10.1007/978-4-431-66996-8_74
Alcala CF, Joe Qin S. Analysis and generalization of fault diagnosis methods for process monitoring. Journal of Process Control. 2011; 21:322-330. DOI: 10.1016/j.jprocont.2010.10.005
Lyman PR, Georgakis C. Plant-wide control of the Tennessee Eastman problem. Computers and Chemical Engineering. 1995; 19:321-331. DOI: 10.1016/0098-1354(94)00057-U
Yin S, Ding SX, Haghani A, Hao H, Zhang P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. Journal of Process Control. 2012; 22:1567-1581. DOI: 10.1016/j.jprocont.2012.06.009