## 1. Introduction

Monitoring chemical and environmental processes has increasingly attracted greater attention of researchers and practitioners for improving the quality of products and enhancing process safety. For example, detecting anomalies in chemical or environmental plants is expected to reflect not only on the productivity and profitability of these plants, but also on the safety of people [1, 2]. To enhance process operation, we should monitor the process in an efficient manner and correctly detect abnormality events that may result in any degradation of product quality, operation reliability, and profitability, in order that we can respond accordingly by making any necessary correction to the process. Fault detection and diagnosis represent two vital components of process monitoring (see **Figure 1**), during which abnormal events are first identified and then isolated to ensure that they can be appropriately handled [2, 3]. Generally, faults in modern automatic processes are difficult to avoid and may result in serious process degradations. Even small deviations in process parameters can result in lost time, and catastrophic failure can bring devastating health, safety, and financial consequences. Because of this, engineers must keep tweaking and improving the reliability of their processes, watching carefully for signs of anomalies that could lead to disaster. Therefore, it is crucial to be able to detect and identify any possible faults or failures in the system as early as possible [2, 4, 5].

Keeping an automated process running smoothly and safely and producing the desired results remains a major challenge in many sectors. Various fault detection techniques have been developed for the safe operation of systems or processes. There are two main types of these techniques: process history‐based approaches and model‐based approaches, as shown in **Figure 2**. Model‐based approaches compare analytically computed outputs with measured values and signal an alarm when large differences are detected [2, 6, 7]. Unfortunately, the effectiveness of model‐based fault‐detection approaches relies on the accuracy of the models used. When there is no process model, model‐free or process‐history‐based methods were successfully used in process monitoring because they can effectively deal with highly correlated process variables [8, 9]. Such methods require a minimal a prior knowledge about process physics, but depends on the availability of quality input data. Process‐history‐based methods use implicit empirical models derived from analysis of available data and rely on computational intelligence and machine learning methods [10–12]. In the last four decades, process‐history‐based methods such as principal component analysis (PCA) and partial least squares (PLSs) have become more and more important in statistical process monitoring. They have been extensively applied in the field of chemometrics [5, 13, 14]. In contrast to the classical univariate statistical process monitoring tools, these approaches take the correlations between variables into account and monitor a set of correlated variables simultaneously. Moreover, by projecting the original measurements into a latent sub space, latent variables (LVs) are monitored in a reduced dimensional space. A PCA or PLS model is built on good historical data of normal or process operation [15, 16]. This model can then be used to monitor or predict the future behavior of the process [17].

However, most of the processes are in dynamic state, with various events occurring such as abrupt process changes, slow drifts, bad measurements due to sensor failures, and human errors. Data from these processes are not only cross‐correlated, but also autocorrelated. Applying conventional latent variable regression (LVR) methods directly to dynamic systems results in false alarms, making it insensitive to detect and discriminate different kinds of events. In addition, noisy data and model uncertainties negatively affect the performance of fault detection methods. In fact, wavelet‐based multiscale representation of data has been shown to provide effective noise‐feature separation in the data, to approximately decorrelate autocorrelated data, and to transform the data to better follow the Gaussian distribution [18]. Multiscale representation of data using wavelets has been widely used for data denoising, compression, and for process monitoring [18–21].

The detection of incipient faults is crucial for maintaining the normal operations of a system by providing early fault warnings. The problem is that incipient anomalies are often too weak to be detected by conventional monitoring methods. The objective of this chapter is to extend the fault detection techniques developed to take into account the uncertainty of the data. To this end, multiscale data representation, a powerful feature extraction tool, will be used to reduce false alarms by improving noise‐feature data separation and decorrelation of autocorrelated measurement errors. To do so, multiscale partial least square (MSPLS)‐based exponentially weighted moving average (EWMA) fault detection techniques will be developed. The overarching goal of this work is to tackle multivariate challenges in process monitoring by merging the advantages of EWMA chart and multiscale‐PLS modeling to enhance their performance. It is shown through simulated distillation column data that significant improvement in detecting small fault can be obtained using the MSPLS‐EWMA approach as compared to the PLS‐EWMA fault detection approach.

The remainder of this chapter is organized as follows. Section 2 gives a brief overview of the PLS and the multiscale PLS approach. In Section 3, we present the proposed MSPLS‐EWMA fault‐detection procedure. In Section 4, EWMA chart is briefly presented. Section 5 applies the proposed fault‐detection procedure to a simulated distillation column process. Finally, Section 6 concludes the chapter.

## 2. Preliminary materials

### 2.1. Partial least squares (PLS)‐based charts

The objective of PLS models is to find relations between input and output data blocks by relating their latent variables. A detailed description of the PLS technique is given in Ref. [22]. This data‐driven empirical statistical model approach is extremely useful under the situation where either a first principal model or analytical model is difficult to obtain or the measured variables are highly correlated (collinear) to each other. The PLS methods have been extensively researched and applied in the chemometrics field.

Consider an input data matrix
**Figure 3**). The input and output matrices can be related to LVs as follows via the outer model [23]:

where

where

where matrix

### 2.2. Wavelet transform

Most engineering processes generate data with multiscale properties, signifying that they include both useful information and noise at different times and frequencies. The majority of fall detection approaches are based on time‐domain data (operates on a single time scale) that do not take multiscale characteristics of the data into consideration. Wavelet analysis has been show to represent data with multiscale properties, efficiently separating deterministic and stochastic features [18].

Multiresolution time series decomposition was initially applied by Mallat, who used orthogonal wavelet bases during data compression for image decoding [25]. Wavelets represent a family of basis functions that can be expressed as the following localized in both time and frequency [18]:

where

The discrete wavelet transform (DWT) analyzes the signal at different scales (or over different frequency bands) by decomposing the signal at each scale into a coarse approximation (low frequency information),

where
**Figure 4**.

In the next section, we highlight the advantages of multiscale.

### 2.3. Advantages of multiscale representation

Conventional methods are referred to as time‐domain analysis methods. These methods are more sensitive toward impulsive oscillations and are unable to extract frequencies and patterns in the data that may be hidden. Before the introduction of multiscale wavelet analysis, mathematical tools such as Fourier transform analysis, coherence function analysis, and power spectral density analysis were used. However, these tools would only allow the signal to imitate the tool being used for analysis. For example, the use of Fourier transform analysis would decompose the signal into a sum of cosine and sine functions. Multiscale helps overcome this problem as it helps simultaneously examine both the time and frequency domains, while Fourier transform is only capable of shifting between the time and frequency domain.

Ganesan et al. in a literature review of multiscale statistical process monitoring state the following advantages of using wavelet coefficients in Multiscale statistical process control (MSSPC) over conventional Statistical process control (SPC) methods [20]:

### 2.4. Separating noise feature

Two important applications, data compression and data denoising, can be achieved through wavelet multiscale decomposition. One of the biggest advantages of multiscale representation is its capacity of distinguishing measurement noise from useful data features, by applying low and high pass filters to the data during multiscale decomposition. This allows the separation of features at different resolutions or frequencies, which makes multiscale representation a better tool for filtering or denoising noisy data than traditional linear filters, like the mean filter and the EWMA filter. Despite their popularity, linear filters rely on defining a frequency threshold above where all features are treated as measurement noise. The ability of multiscale representation to separate noise has been used not only to improve data filtering, but also to improve the prediction accuracy of several empirical modeling methods and the accuracy of state estimators.

A noisy signal is filtered by a three‐step method [30]:

### 2.5. Multiscale PLS modeling

Data observed from environmental and engineering processes are usually noisy and correlated in time, which makes the fault detection more difficult as the presence of noise degrades fault detection quality, and most methods are developed for independent observations. Multiscale representation of data using wavelets is a powerful feature extraction tool that is well suited to denoising and decorrelating time series data.

The integrated multiscale PLS (MSPLS) modeling approach is to take advantage of the both latent variable regression and denoising ability of the multiscale decomposition using wavelets. Thus, improve in prediction ability of the model, which in term improves the fault detection methods. The given input variable data matrix

where

However, denoising the input and output variables a prior to developing model results in poor prediction ability of the MSPLS model due to removal of features which may be important to model. Therefore, in the proposed integrated MSPLS modeling approach, the selection of optimum decomposition depth based on the prediction ability of the developed MSPLS model is used. The integrated MSPLS modeling algorithm is summarized next [8].

Preprocessing of training and testing data is required to ensure that all available data is set to zero mean and unit variance.

Wavelet decomposition allows the data to be converted into wavelet coefficients. This changes the set of data from a single scale to multiple scales that allow for multiscale modeling.

Filter the training data at different scales based on the filtering algorithm is given in Section 2.4.

Build a PLS model using the filtered data at each scale. Cross‐validation is used to determine the number of LVs.

Use the estimated model from each scale to predict the output for the testing data and compute the cross‐validated mean square error.

Choose the PLS with the smallest cross‐validated mean square error as the MSPLS model.

Once an MSPLS model based on past normal operation is obtained, it can be used to monitor future deviation from normality. Two monitoring statistics, the
**Figure 5**).

The

where

where

The squared prediction error (SPE) or

captures the changes in the residual subspace.

where

However, the MSPLS‐based

## 3. EWMA monitoring charts

In this section, we briefly introduce the basic idea of the EWMA chart and its properties. For a more detailed discussion of EWMA charts, see Ref. [33]. EWMA is a statistic that gives less weight to old data, and more weight to new data. The EWMA charts are able to detect small shifts in the process mean, since the EWMA statistic is a time‐weighted average of all previous observations. The EWMA control scheme was first introduced by Roberts [34], and is extensively used in time series analysis. The EWMA monitoring chart is an anomaly detection technique widely used by scientists and engineers in various disciplines [6, 33, 35]. Assume that

The starting value

where

Under fault‐free conditions, the standard deviation of

where

## 4. Combining MSPLS model with EWMA chart: MSPLS‐EWMA

In this chapter, we combine the advantages of MSPLS modeling with those of the univariate EWMA monitoring chart, which results in an improved fault detection system, especially for detecting small faults in highly correlated, multivariate data. Toward this end, we applied EWMA charts to the output residuals obtained from the MSPLS model (see **Figure 6**). Indeed, under normal operation with little noise and few errors, the residuals are close to zero, while they significantly deviate from zero in the presence of abnormal events. In this work, the output residuals from MSPLS are used as a fault indicator.

As given in Eq. (6), the output vector

The residual of the output variable,

In this case, since the EWMA control scheme is applied on the residual data matrix, one EWMA decision function will be computed to monitor the process.

## 5. Monitoring a simulated distillation column

In this section, the ability of the proposed MSPLS‐EWMA technique to detect faults is studied through simulation data and the results compared with those obtained using a traditional PLS‐EWMA method. In all monitoring charts, the red‐shaded area is the region where the fault is injected to the test data while the 95% control limits are plotted by the horizontal‐dashed line.

### 5.1. Description and data generation of the process

A distillation column is most commonly used unit operation in chemical process industries. The objective of the distillation operation is to separate the component from a mixture of component. The operation of distillation column is very energy expensive. Therefore, monitoring of such process plays very important role in bringing down the cost of the operation. The schematic diagram of the distillation column is shown in **Figure 7**.

The efficacy of the proposed fault detection strategy tested using simulated (using ASPEN simulation software [36]) distillation column. The input variables consist of temperature measurements at different location of the distillation column along with feed flow rate and reflux flow. The light distillate from reflux drum considered as the response variable. The operating conditions, nominal operating conditions, and detailed steps involved in the data generation can be found in Ref. [36]. The generated 1024 data samples are then corrupted with zero mean Gaussian white noise with signal‐to‐noise ratio (SNR) of 10 dB used for model development and testing the Fault detection (FD) strategy. **Figure 8** shows dynamic data of the distillation column, i.e., variations of the light component for changes in the reflux and feed flow. The MSPLS model is developed from first 512 data samples and later part of the data points is used for testing purpose. The optimal LVs for the model are achieved through cross‐validation methods and found to be three LVs for the MSPLS model.

A scatter plot of the measured and predicted data is presented in **Figure 9**. This plot indicates a reasonable performance of the selected models.

### 5.2. Detection results

After a process model has been successfully identified, we can proceed with fault detection. Three types of faults in distillation columns will be considered here: abrupt, intermittent, and gradual faults.

To quantify the efficiency of the proposed strategies, we use two metrics: the false detection rate (FAR) and the miss detection rate (MDR) [37]. The FAR is the number of normal observations that are wrongly judged as faulty (false alarms) over the total number of fault‐free data. The MDR is the number of faults that are wrongly classified as normal (missed detections) over the total number of faults.

#### 5.2.1. Case (A): abrupt fault detection

In this case study, an abrupt change is simulated by adding a small constant deviation which is 2% of the total variation in temperature
*Q* monitoring performances. Results of the PLS‐*Q* and MSPLS‐*Q* statistics are demonstrated in **Figure 10(a)** and **(c)**, respectively. It can be seen from **Figure 10(a)** and **(c)** that PLS‐*Q* and MSPLS‐*Q* cannot detect this small fault. **Figure 10(b)** shows that the PLS‐EWMA chart is capable of detecting this simulated fault but with a lot missed detection (i.e., MDR = 55% and FAR = 0.96%). **Figure 10(d)** shows that although the MSPLS‐EWMA chart clearly detected this abrupt faults without missed detection (i.e., MDR = 0% and FAR = 0.96%).

#### 5.2.2. Case (B): intermittent fault

In this case study, we introduce into the testing data a bias of amplitude 2% of the total variation in temperature
**Figure 11(a)**–**(d)** shows the monitoring results of the PLS‐based
**Figure 11(a)** shows that the PLS‐based
**Figure 11(b)**, it can be seen that the MSPLS‐*Q* chart can detect the intermittent faults but with several missed detections. **Figure 11(c)** shows that the PLS‐EWMA chart can indeed detect this fault, but with some missed detections. On the other hand, the MSPLS‐EWMA chart with
**Figure 11(d)**). In this case study, we can see that detection performance is much enhanced when using the MSPLS‐EWMA chart compared to the others.

#### 5.2.3. Case (C): drift failure detection

A slow drift fault is simulated by adding a ramp change with a slope of 0.01 to the temperature sensor,
**Figure 12(a)**–**(d)**. **Figure 12(a)** shows the monitoring results of PLS‐*Q* chart, in which we can see that a signal is first given at sample 313 with a significant false alarm rate (i.e., FAR = 22.4%). **Figure 12(b)** shows that the PLS‐EWMA chart first detects the fault at the 290th observation. The MSPLS‐*Q* chart is shown in **Figure 12(c)**, which first flags the fault at sample 323. **Figure 12(d)** shows that the MSPLS‐EWMA chart first detects the fault at the 288th observation. Therefore, a fewer observations are needed for the MSPLS‐EWMA chart to detect a fault compared to the other charts.

This case study testifies again to the superiority of the proposed approaches compared to conventional PLS‐based fault detection. Of course, this chapter also demonstrates through simulated data that significant improvement in fault detection can be obtained by using the MSPLS model when combined with the EWMA chart.

## 6. Conclusion

The objective of this chapter is to extend the PLS fault‐detection methods to deal with uncertainty in the measurements. The developed approach merges the flexibility of multiscale PLS model and the greater sensitivity of the EWMA control chart to incipient changes. Specifically, in this approach, the multiscale PLS model has been constructed using the wavelet coefficients at different scales, and then EWMA monitoring chart was applied using this model to improve the fault detection abilities of this PLS fault detection method even further. Using a simulated distillation column, we demonstrate the effectiveness of MSPLS‐EWMA to detect abrupt and drift faults. Results show that the MSPLS‐EWMA can achieve better fault‐detection efficiency than the PLS‐EWMA, PLS‐*Q*, and MSPLS‐*Q* monitoring approaches.