Data observed from environmental and engineering processes are usually noisy and correlated in time, which makes the fault detection more difficult as the presence of noise degrades fault detection quality. Multiscale representation of data using wavelets is a powerful feature extraction tool that is well suited to denoising and decorrelating time series data. In this chapter, we combine the advantages of multiscale partial least squares (MSPLSs) modeling with those of the univariate EWMA (exponentially weighted moving average) monitoring chart, which results in an improved fault detection system, especially for detecting small faults in highly correlated, multivariate data. Toward this end, we applied EWMA chart to the output residuals obtained from MSPLS model. It is shown through simulated distillation column data the significant improvement in fault detection can be obtained by using the proposed methods as compared to the use of the conventional partial least square (PLS)‐based Q and EWMA methods and MSPLS‐based Q method.
- data uncertainty
- multiscale representation
- fault detection
- data‐driven approaches
- statistical monitoring schemes
Monitoring chemical and environmental processes has increasingly attracted greater attention of researchers and practitioners for improving the quality of products and enhancing process safety. For example, detecting anomalies in chemical or environmental plants is expected to reflect not only on the productivity and profitability of these plants, but also on the safety of people [1, 2]. To enhance process operation, we should monitor the process in an efficient manner and correctly detect abnormality events that may result in any degradation of product quality, operation reliability, and profitability, in order that we can respond accordingly by making any necessary correction to the process. Fault detection and diagnosis represent two vital components of process monitoring (see Figure 1), during which abnormal events are first identified and then isolated to ensure that they can be appropriately handled [2, 3]. Generally, faults in modern automatic processes are difficult to avoid and may result in serious process degradations. Even small deviations in process parameters can result in lost time, and catastrophic failure can bring devastating health, safety, and financial consequences. Because of this, engineers must keep tweaking and improving the reliability of their processes, watching carefully for signs of anomalies that could lead to disaster. Therefore, it is crucial to be able to detect and identify any possible faults or failures in the system as early as possible [2, 4, 5].
Keeping an automated process running smoothly and safely and producing the desired results remains a major challenge in many sectors. Various fault detection techniques have been developed for the safe operation of systems or processes. There are two main types of these techniques: process history‐based approaches and model‐based approaches, as shown in Figure 2. Model‐based approaches compare analytically computed outputs with measured values and signal an alarm when large differences are detected [2, 6, 7]. Unfortunately, the effectiveness of model‐based fault‐detection approaches relies on the accuracy of the models used. When there is no process model, model‐free or process‐history‐based methods were successfully used in process monitoring because they can effectively deal with highly correlated process variables [8, 9]. Such methods require a minimal a prior knowledge about process physics, but depends on the availability of quality input data. Process‐history‐based methods use implicit empirical models derived from analysis of available data and rely on computational intelligence and machine learning methods [10–12]. In the last four decades, process‐history‐based methods such as principal component analysis (PCA) and partial least squares (PLSs) have become more and more important in statistical process monitoring. They have been extensively applied in the field of chemometrics [5, 13, 14]. In contrast to the classical univariate statistical process monitoring tools, these approaches take the correlations between variables into account and monitor a set of correlated variables simultaneously. Moreover, by projecting the original measurements into a latent sub space, latent variables (LVs) are monitored in a reduced dimensional space. A PCA or PLS model is built on good historical data of normal or process operation [15, 16]. This model can then be used to monitor or predict the future behavior of the process .
However, most of the processes are in dynamic state, with various events occurring such as abrupt process changes, slow drifts, bad measurements due to sensor failures, and human errors. Data from these processes are not only cross‐correlated, but also autocorrelated. Applying conventional latent variable regression (LVR) methods directly to dynamic systems results in false alarms, making it insensitive to detect and discriminate different kinds of events. In addition, noisy data and model uncertainties negatively affect the performance of fault detection methods. In fact, wavelet‐based multiscale representation of data has been shown to provide effective noise‐feature separation in the data, to approximately decorrelate autocorrelated data, and to transform the data to better follow the Gaussian distribution . Multiscale representation of data using wavelets has been widely used for data denoising, compression, and for process monitoring [18–21].
The detection of incipient faults is crucial for maintaining the normal operations of a system by providing early fault warnings. The problem is that incipient anomalies are often too weak to be detected by conventional monitoring methods. The objective of this chapter is to extend the fault detection techniques developed to take into account the uncertainty of the data. To this end, multiscale data representation, a powerful feature extraction tool, will be used to reduce false alarms by improving noise‐feature data separation and decorrelation of autocorrelated measurement errors. To do so, multiscale partial least square (MSPLS)‐based exponentially weighted moving average (EWMA) fault detection techniques will be developed. The overarching goal of this work is to tackle multivariate challenges in process monitoring by merging the advantages of EWMA chart and multiscale‐PLS modeling to enhance their performance. It is shown through simulated distillation column data that significant improvement in detecting small fault can be obtained using the MSPLS‐EWMA approach as compared to the PLS‐EWMA fault detection approach.
The remainder of this chapter is organized as follows. Section 2 gives a brief overview of the PLS and the multiscale PLS approach. In Section 3, we present the proposed MSPLS‐EWMA fault‐detection procedure. In Section 4, EWMA chart is briefly presented. Section 5 applies the proposed fault‐detection procedure to a simulated distillation column process. Finally, Section 6 concludes the chapter.
2. Preliminary materials
2.1. Partial least squares (PLS)‐based charts
The objective of PLS models is to find relations between input and output data blocks by relating their latent variables. A detailed description of the PLS technique is given in Ref. . This data‐driven empirical statistical model approach is extremely useful under the situation where either a first principal model or analytical model is difficult to obtain or the measured variables are highly correlated (collinear) to each other. The PLS methods have been extensively researched and applied in the chemometrics field.
Consider an input data matrix and an output data matrix , where is the number of samples or observations, and are the number of input and output variables, respectively. The objective of PLS is to maximize the covariance matrix between linear combinations of and . A PLS model is given by the inner model and the outer model [15, 23] (see Figure 3). The input and output matrices can be related to LVs as follows via the outer model :
where and are approximated data matrices of and , respectively, the matrices and consist of retained LVs of the input and output data, respectively. and represent the residuals matrices that were the unexplained variance of the input and output data, respectively, and are the loading of matrices and , respectively. In practice, how to choose a proper number for LVs is an important step in PLS modeling. If all LVs are used in modeling, the model may fit the noise and therefore reduce the predictive ability of the model. Here, the cross‐validation method can be used to determine a proper number of LVs . The inner model can be computed as
where is a regression matrix and is a residual matrix. The information in can be expressed as
where matrix was the residue that presented the unexplained variance.
2.2. Wavelet transform
Most engineering processes generate data with multiscale properties, signifying that they include both useful information and noise at different times and frequencies. The majority of fall detection approaches are based on time‐domain data (operates on a single time scale) that do not take multiscale characteristics of the data into consideration. Wavelet analysis has been show to represent data with multiscale properties, efficiently separating deterministic and stochastic features .
Multiresolution time series decomposition was initially applied by Mallat, who used orthogonal wavelet bases during data compression for image decoding . Wavelets represent a family of basis functions that can be expressed as the following localized in both time and frequency :
where represents the dilation parameter, is the translation parameter  and is the mother wavelet. Both these parameters are commonly discretized dyadically as , , , and the family of wavelets can represented as . Here, is the mother wavelet and and are the respective dilation and translation parameters, respectively. Different families of basis functions are created based on their convolution with different filters, such as the Haar scaling function and the Daubechies filters [26, 27]. Parameters that are discretized dyadically force downsampling reduce the number of parameters dyadically with every decomposition. However, dyadically discretized wavelet force samples at nondyadic locations to become decomposed only after a certain time delay.
The discrete wavelet transform (DWT) analyzes the signal at different scales (or over different frequency bands) by decomposing the signal at each scale into a coarse approximation (low frequency information), , and detail information (high frequency information), . DWT employs two sets of functions: the scaling functions and wavelet functions , which are associated with low pass filter H and high pass filter G, respectively. Where the coarsest scale usually termed the decomposition level. Any signal can be represented by a summation of all scaled and detailed signals as follows :
where , , , and represent the dilation parameter, translation parameter, number of scales, and number observations in the original signal, respectively [28, 29]. and are respectively the scaling and the wavelet coefficients, and and represent the approximated signal and the detail signals, respectively. Of course, by passing a series of high and low pass wavelets filters, it is decomposed into signals at different scales as shown in Figure 4.
In the next section, we highlight the advantages of multiscale.
2.3. Advantages of multiscale representation
Conventional methods are referred to as time‐domain analysis methods. These methods are more sensitive toward impulsive oscillations and are unable to extract frequencies and patterns in the data that may be hidden. Before the introduction of multiscale wavelet analysis, mathematical tools such as Fourier transform analysis, coherence function analysis, and power spectral density analysis were used. However, these tools would only allow the signal to imitate the tool being used for analysis. For example, the use of Fourier transform analysis would decompose the signal into a sum of cosine and sine functions. Multiscale helps overcome this problem as it helps simultaneously examine both the time and frequency domains, while Fourier transform is only capable of shifting between the time and frequency domain.
Ganesan et al. in a literature review of multiscale statistical process monitoring state the following advantages of using wavelet coefficients in Multiscale statistical process control (MSSPC) over conventional Statistical process control (SPC) methods :
The ability to separate noise from important feature.
The wavelet coefficients of autocorrelated data are approximately decorrelated at multiple scales.
Data are closer to normality at multiple scales.
2.4. Separating noise feature
Two important applications, data compression and data denoising, can be achieved through wavelet multiscale decomposition. One of the biggest advantages of multiscale representation is its capacity of distinguishing measurement noise from useful data features, by applying low and high pass filters to the data during multiscale decomposition. This allows the separation of features at different resolutions or frequencies, which makes multiscale representation a better tool for filtering or denoising noisy data than traditional linear filters, like the mean filter and the EWMA filter. Despite their popularity, linear filters rely on defining a frequency threshold above where all features are treated as measurement noise. The ability of multiscale representation to separate noise has been used not only to improve data filtering, but also to improve the prediction accuracy of several empirical modeling methods and the accuracy of state estimators.
A noisy signal is filtered by a three‐step method :
Apply wavelet transform to decompose the noisy signal into the time‐frequency domain.
Threshold the detail coefficient and remove coefficients a selected threshold.
Transform back into the original domain the threshold coefficients to obtain a filtered signal.
2.5. Multiscale PLS modeling
Data observed from environmental and engineering processes are usually noisy and correlated in time, which makes the fault detection more difficult as the presence of noise degrades fault detection quality, and most methods are developed for independent observations. Multiscale representation of data using wavelets is a powerful feature extraction tool that is well suited to denoising and decorrelating time series data.
The integrated multiscale PLS (MSPLS) modeling approach is to take advantage of the both latent variable regression and denoising ability of the multiscale decomposition using wavelets. Thus, improve in prediction ability of the model, which in term improves the fault detection methods. The given input variable data matrix and response variable matrix are decomposed at different scales using multiscale basis function called wavelets. Let the decomposed data at each scale be and . Then, the MSPLS model is developed using decomposed data, can be expressed as
where is the filtered input data matrix at scale , is the response output vector at scale . is the MSPLS model residual at decomposition scale.
However, denoising the input and output variables a prior to developing model results in poor prediction ability of the MSPLS model due to removal of features which may be important to model. Therefore, in the proposed integrated MSPLS modeling approach, the selection of optimum decomposition depth based on the prediction ability of the developed MSPLS model is used. The integrated MSPLS modeling algorithm is summarized next .
Preprocessing of training and testing data is required to ensure that all available data is set to zero mean and unit variance.
Wavelet decomposition allows the data to be converted into wavelet coefficients. This changes the set of data from a single scale to multiple scales that allow for multiscale modeling.
Filter the training data at different scales based on the filtering algorithm is given in Section 2.4.
Build a PLS model using the filtered data at each scale. Cross‐validation is used to determine the number of LVs.
Use the estimated model from each scale to predict the output for the testing data and compute the cross‐validated mean square error.
Choose the PLS with the smallest cross‐validated mean square error as the MSPLS model.
Once an MSPLS model based on past normal operation is obtained, it can be used to monitor future deviation from normality. Two monitoring statistics, the and statistics, are usually utilized for fault detection purposes . First, the Hoteling statistics indicates the variation within the process model in the LVs subspace. Second, the statistic, also known as the squared prediction error (SPE), monitors how well the data conforms to the model (see Figure 5).
The statistic based on the number of retained LVs, , is defined as 
where is eigenvalue of the covariance matrix of . The statistic measures the variation in the LVs only. A large change in the PC subspace is observed if some points exceed the confidence limit of the chart, indicating a big deviation in the monitored system. Confidence limits for at level () relate to the Fisher distribution, , as follows :
where is the upper critical point of with and degrees of freedom.
The squared prediction error (SPE) or statistic, which is defined as 
captures the changes in the residual subspace. represents the residuals vector, which is the difference between the new observation, , and its prediction, , via the MSPLS model. Eq. (9) provides a direct mean of the statistic in terms of the total sum of measured variation in the residual vector . The SPE can be considered a measure of the system‐model mismatch. The confidence limits for SPE are given in Ref. . This test suggests the existence of an abnormal condition when , where is defined as
is the confidence limits for the percentile in a normal distribution.
However, the MSPLS‐based and approaches fail to detect small faults . Here, we use only the ‐based chart as a benchmark for fault detection with PLS and MSPLS. Motivated by the power of the EWMA chart, which are widely used univariate control chart, is proposed as improved alternatives for fault detection. The objective is to tackle MSPLS challenges in process monitoring by merging the advantages of the EWMA and MSPLS approaches to enhance their performance and widen their practical applicability.
3. EWMA monitoring charts
In this section, we briefly introduce the basic idea of the EWMA chart and its properties. For a more detailed discussion of EWMA charts, see Ref. . EWMA is a statistic that gives less weight to old data, and more weight to new data. The EWMA charts are able to detect small shifts in the process mean, since the EWMA statistic is a time‐weighted average of all previous observations. The EWMA control scheme was first introduced by Roberts , and is extensively used in time series analysis. The EWMA monitoring chart is an anomaly detection technique widely used by scientists and engineers in various disciplines [6, 33, 35]. Assume that are individual observations collected from a monitored process. The expression for the EWMA is 
The starting value is usually set to the mean of the fault‐free data, . is the output of EWMA and is the observation from the monitored process at the current time. The forgetting parameter determines how fast EWMA forgets historical data. Equation (13) can also be written as
where is the weight for , which falls off exponentially for past observations. We can see that if is small, then more weight is assigned to past observations. Thus, the chart is tuned to have efficiency for detecting small changes in the process mean. On the other hand, if is large, then more weight is assigned to the current observations, and the chart is more suitable for detecting large shifts . In the special case, , the EWMA is equal to the most recent observation, , and provides the same results as Shewhart chart. As approaches zero, EWMA approximates the CUSUM criteria, which gives equal weights to the current and historical observations.
Under fault‐free conditions, the standard deviation of is defined as
where is the standard deviation of the fault‐free or preliminary data set. Therefore, in such cases, . However, in the presence of a mean shift at the time point , . The upper and lower control limits (UCL and LCL) of the EWMA chart for detecting a mean shift are , where is a multiplier of the EWMA standard deviation . The parameters and need to be set carefully . In practice, is usually set to three, which corresponds to a false alarm rate of 0.27%. If is within the interval [LCL and UCL], then we conclude that the process is under control up to time point . Otherwise, the process is considered out of control.
4. Combining MSPLS model with EWMA chart: MSPLS‐EWMA
In this chapter, we combine the advantages of MSPLS modeling with those of the univariate EWMA monitoring chart, which results in an improved fault detection system, especially for detecting small faults in highly correlated, multivariate data. Toward this end, we applied EWMA charts to the output residuals obtained from the MSPLS model (see Figure 6). Indeed, under normal operation with little noise and few errors, the residuals are close to zero, while they significantly deviate from zero in the presence of abnormal events. In this work, the output residuals from MSPLS are used as a fault indicator.
As given in Eq. (6), the output vector can be written as the sum of a predicted vector and a residual vector , i.e.,
The residual of the output variable, , which is the difference between the observed value of the output variable, , and the predicted value, , obtained from the MSPLS model, is a potential indicator for fault detection. The EWMA statistic based on the residuals of the response variable can be calculated as follows:
In this case, since the EWMA control scheme is applied on the residual data matrix, one EWMA decision function will be computed to monitor the process.
5. Monitoring a simulated distillation column
In this section, the ability of the proposed MSPLS‐EWMA technique to detect faults is studied through simulation data and the results compared with those obtained using a traditional PLS‐EWMA method. In all monitoring charts, the red‐shaded area is the region where the fault is injected to the test data while the 95% control limits are plotted by the horizontal‐dashed line.
5.1. Description and data generation of the process
A distillation column is most commonly used unit operation in chemical process industries. The objective of the distillation operation is to separate the component from a mixture of component. The operation of distillation column is very energy expensive. Therefore, monitoring of such process plays very important role in bringing down the cost of the operation. The schematic diagram of the distillation column is shown in Figure 7.
The efficacy of the proposed fault detection strategy tested using simulated (using ASPEN simulation software ) distillation column. The input variables consist of temperature measurements at different location of the distillation column along with feed flow rate and reflux flow. The light distillate from reflux drum considered as the response variable. The operating conditions, nominal operating conditions, and detailed steps involved in the data generation can be found in Ref. . The generated 1024 data samples are then corrupted with zero mean Gaussian white noise with signal‐to‐noise ratio (SNR) of 10 dB used for model development and testing the Fault detection (FD) strategy. Figure 8 shows dynamic data of the distillation column, i.e., variations of the light component for changes in the reflux and feed flow. The MSPLS model is developed from first 512 data samples and later part of the data points is used for testing purpose. The optimal LVs for the model are achieved through cross‐validation methods and found to be three LVs for the MSPLS model.
A scatter plot of the measured and predicted data is presented in Figure 9. This plot indicates a reasonable performance of the selected models.
5.2. Detection results
After a process model has been successfully identified, we can proceed with fault detection. Three types of faults in distillation columns will be considered here: abrupt, intermittent, and gradual faults.
To quantify the efficiency of the proposed strategies, we use two metrics: the false detection rate (FAR) and the miss detection rate (MDR) . The FAR is the number of normal observations that are wrongly judged as faulty (false alarms) over the total number of fault‐free data. The MDR is the number of faults that are wrongly classified as normal (missed detections) over the total number of faults.
5.2.1. Case (A): abrupt fault detection
In this case study, an abrupt change is simulated by adding a small constant deviation which is 2% of the total variation in temperature , to the temperature sensor measurements , between sample times 150 and 200. In the example, the testing data with low SNR, SNR = 5, are generated for the purpose of evaluation of MSPLS‐EWMA and PLS‐Q monitoring performances. Results of the PLS‐Q and MSPLS‐Q statistics are demonstrated in Figure 10(a) and (c), respectively. It can be seen from Figure 10(a) and (c) that PLS‐Q and MSPLS‐Q cannot detect this small fault. Figure 10(b) shows that the PLS‐EWMA chart is capable of detecting this simulated fault but with a lot missed detection (i.e., MDR = 55% and FAR = 0.96%). Figure 10(d) shows that although the MSPLS‐EWMA chart clearly detected this abrupt faults without missed detection (i.e., MDR = 0% and FAR = 0.96%).
5.2.2. Case (B): intermittent fault
In this case study, we introduce into the testing data a bias of amplitude 2% of the total variation in temperature of between samples 50 and 100, and a bias of 10% between samples 350 and 450. Figure 11(a)–(d) shows the monitoring results of the PLS‐based and EWMA charts, and MSPLS‐based and EWMA charts. Figure 11(a) shows that the PLS‐based chart has no power to detect this fault. From Figure 11(b), it can be seen that the MSPLS‐Q chart can detect the intermittent faults but with several missed detections. Figure 11(c) shows that the PLS‐EWMA chart can indeed detect this fault, but with some missed detections. On the other hand, the MSPLS‐EWMA chart with correctly detects this intermittent fault (see Figure 11(d)). In this case study, we can see that detection performance is much enhanced when using the MSPLS‐EWMA chart compared to the others.
5.2.3. Case (C): drift failure detection
A slow drift fault is simulated by adding a ramp change with a slope of 0.01 to the temperature sensor, , from sample 250 through the end of the testing data. Monitoring results of PLS and MSPLS‐based and EWMA statistics are shown in Figure 12(a)–(d). Figure 12(a) shows the monitoring results of PLS‐Q chart, in which we can see that a signal is first given at sample 313 with a significant false alarm rate (i.e., FAR = 22.4%). Figure 12(b) shows that the PLS‐EWMA chart first detects the fault at the 290th observation. The MSPLS‐Q chart is shown in Figure 12(c), which first flags the fault at sample 323. Figure 12(d) shows that the MSPLS‐EWMA chart first detects the fault at the 288th observation. Therefore, a fewer observations are needed for the MSPLS‐EWMA chart to detect a fault compared to the other charts.
This case study testifies again to the superiority of the proposed approaches compared to conventional PLS‐based fault detection. Of course, this chapter also demonstrates through simulated data that significant improvement in fault detection can be obtained by using the MSPLS model when combined with the EWMA chart.
The objective of this chapter is to extend the PLS fault‐detection methods to deal with uncertainty in the measurements. The developed approach merges the flexibility of multiscale PLS model and the greater sensitivity of the EWMA control chart to incipient changes. Specifically, in this approach, the multiscale PLS model has been constructed using the wavelet coefficients at different scales, and then EWMA monitoring chart was applied using this model to improve the fault detection abilities of this PLS fault detection method even further. Using a simulated distillation column, we demonstrate the effectiveness of MSPLS‐EWMA to detect abrupt and drift faults. Results show that the MSPLS‐EWMA can achieve better fault‐detection efficiency than the PLS‐EWMA, PLS‐Q, and MSPLS‐Q monitoring approaches.
This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No: OSR‐2015‐CRG4‐2582.