## 1. Introduction

As complex and expensive mechanical systems, gas turbine engines benefit a lot from the application of advanced diagnostic technologies, and the use of monitoring systems has become a standard practice. To perform effective analysis, there are different diagnostic approaches that cover all gas turbine subsystems. The diagnostic algorithms based on measured gas path variables are considered as principal and pretty complex. These variables (air and gas pressures and temperatures, rotation speeds, fuel consumption, etc.) carry valuable information about an engine’s health condition and allow to detect and identify different engine abrupt faults and deterioration mechanisms (for instance, foreign object damage, fouling, erosion, tip ribs, and seal wear). Malfunctions of measurement and control systems can be diagnosed as well. Thousands of technical publications devoting to the gas path diagnosis can be found. They can be arranged according to input information and mathematical models applied.

Although advancement of instrumentation and computer science has enabled extensive field data collection, the data with gas turbine faults are still infrequent because real faults rarely appear. Some intensive and practically permanent deterioration mechanisms, for example, compressor fouling, allow their describing on the basis of real data. However, to describe the variety of all possible faults, mathematical models are widely used. These models and the diagnostic methods that use them fall into two main categories: physics-based and data-driven.

A thermodynamic engine model is a representative physics-based model. This nonlinear model is based on thermodynamic relations between gas path variables. It also employs mass, energy, and momentum conservation laws. Such a sophisticated model has been used in gas turbine diagnostics since the work of Saravanamuttoo H.I.H. (see, e.g., [1]). The model allows to simulate the gas path variables for an engine baseline (healthy engine performance) and for different faults embedded into the model through special internal coefficients called fault parameters. Applying system identification methods to the thermodynamic model, an inverse problem is solved: Unknown fault parameters are estimated using measured gas path variables. During the identification, such parameters are found that minimize the difference between the model variables and the measured ones. Besides the better model accuracy, the simplification of the diagnosing process is reached because the fault parameter estimates contain information of current engine health. The diagnostic algorithms based on the model identification constitute one of two main approaches in gas turbine diagnostics (see, for instance, [1–4]).

The second approach uses a pattern recognition theory. Since model inaccuracy and measurement errors impede a correct diagnosis, gas path fault localization can be characterized as a challenging recognition issue. Numerous applications of recognition tools in gas path diagnostics are known, for instance, genetic algorithms [5], correspondence and discrimination analysis [6], k-nearest neighbor [7], and Bayesian approach [8]. However, the most widespread techniques are artificial neural networks (ANNs). The ANNs applications are not limited by the fault recognition, they are also applied or can be applied at other diagnostic stages: feature extraction, fault detection, and fault prediction.

At the feature extraction stage, differences (a.k.a. deviations) between actual gas path measurements and an engine baseline are determined because they are by far better indicators of engine health than the measurements themselves are. To build the necessary baseline model, the multilayer perceptron (MLP), also called a back-propagation network, is usually employed [9, 10]. To filter noise, an auto-associative configuration of the perceptron is sometimes applied to the measurements [11].

At the fault localization stage, fault classes can be presented by sets of the deviations (patterns) induced by the corresponding faults. Such a pattern-based classification allows to apply the ANNs as recognition techniques, and multiple applications of the MLP (see, e.g., [4, 5]) as well as the radial basis network (RBN) [5], the probabilistic neural network (PNN) [12, 13], and support vector machines (SVM) a.k.a. Support vector network (SVN) [7, 12] were reported. In spite of many publications on gas turbine fault recognition, comparative studies, which allow to choose the best technique [4, 5, 7, 12], are still insufficient. They do not cover all of the used techniques and often provide differing recommendations.

The fault detection stage can also be presented as a pattern recognition problem with two classes to recognize: a class of healthy engines and a class of faulty engines. If the classification for the fault localization stage is available, it does not seem a challenge to use the patterns of this classification for building the fault detection classification. However, the studies applying recognition techniques, in particular the ANNs, for gas turbine fault detection are absent so far. Instead, the detection problem is solved by tolerance monitoring [14, 15].

The fault prediction stage is less investigated than the previous stages, and only few ANNs applications are known. Among them, it is worth to mention book [16] analyzing the ways to predict gas turbine faults and study [17], comparing a recurrent neural network and a nonlinear auto-regressive neural network. We can see that in total for all stages, the perceptron is by far the highest demand network. It is used for filtering the measurements, approximating the engine baseline, and recognizing the faults.

Thus, a brief observation of the neural networks applied for gas turbine diagnosis has revealed that the multiple known cases of their use need better generalization and recommendations to choose the best network. The areas of promising ANNs application were also found. In the present chapter, we generalize our investigations aimed at the optimization of a total diagnostic process through the enhancement of each of its elements. On the one hand, the neural networks help with process realization being its critical elements. On the other hand, the networks themselves are objects of analysis: For known applications, they are compared to choose the best network, and one new application is proposed. During the investigations, the rules of proper network usage have also been established.

The rest of the chapter describes these investigations and is structured as follows: description of the networks used (Section 2), network-based diagnostic approach (Section 3), diagnostic process optimization (Section 4), feature extraction stage optimization (Section 5), fault detection stage optimization (Section 6), and fault localization stage optimization (Section 7).

## 2. Artificial neural networks

The four networks mentioned in the introduction have been chosen for investigations: MLP, RBN, PNN, and SVN. The PNN is a realization of the Parzon Windows and has the important property of probabilistic outputs, that is, the gas turbine faults are recognized on the basis of their confidence probabilities. These probabilities are computed through numerical estimates of probability density of fault patterns. For the purpose of comparison, a similar recognition tool, the K-nearest neighbor (K-NN) method has been involved into the investigations. Foundations of the chosen techniques can be found in many books on classification theory, for example, in [18, 19, 20]. The next subsections include only a brief description of techniques required to better understand the present chapter.

### 2.1. Multilayer perceptron

The perceptron can solve either approximation or classification issues. The scheme shown in **Figure 1** illustrates structure and operation of the MLP [18, 19]. We can see that the perceptron presents a feed-forward neural network in which no feedback is observed, and all signals go only from the input to the output.

To determine a hidden layer input vector, the product of a weight matrix *W*_{1} and a network input vector (pattern) *f*_{1} transforms this vector in an output vector

To find unknown matrixes

### 2.2. Radial basis network

**Figure 2** illustrates operation of an RBN. It includes two layers: a hidden radial basis layer and an output linear layer. Operation of radial basis neurons is different from the perceptron neurons operation [18, 19, 20]. The neuron's input *n* is formed as the Euclidean norm *b* (bias). In this way, *a*=1, and the function decreases when the distance increases. The bias *b* allows changing the neuron sensitivity. The output layer transforms the radial basis output

### 2.3. Probabilistic neural networks

The PNN is a specific variation of radial basis network [18]. It is used to solve classification problems. **Figure 3** presents the scheme of this network and helps to understand its operation. Like the RBN, the probabilistic neural network has two layers.

The hidden layer is formed and operates just like the same layer of the RBN. It is built from learning patterns united in a matrix

The output or classification layer differs from the RBN output layer. Each class has its output neuron that sums the radial basis outputs *W*_{2} formed by 0- and 1-elements is employed. A vector *f*_{2} finally chooses the class with the largest probability. In this way, the probabilistic network classifies input patterns using a probabilistic measure that is more realistic than the perceptron classifying. The PNN is the most used realization of a Parzen Windows (PW) [18], a nonparametric method that estimates probability density in a given point (pattern)

### 2.4. k-Nearest neighbors

Like the Parzen Windows (PNNs), the k-nearest neighbors is a nonparametric technique [18]. For a given class and point (pattern) *k* of class patterns in a nearby region of volume *V* and estimates the necessary probability density in accordance with a simple formula

where *n* stands for a total number of class patterns.

To ensure the convergence of the estimate ρ, we need to satisfy the following requirements

To this end, we increase *n* and can let *V* be proportional to

In contrast to the Parzen Window method that fixes the volume *V* and looks for the number *k*, the *K*-nearest neighbor method specifies *k* and seeks for the sphere of volume *V*. Since the PW uses constant window size, it may not capture patterns when the actual density is low. The density estimate will be equal to zero, and the classification decision confidence will be underestimated. A solution to this problem is to use the window that depends on learning data. Using this principle, the K-NN increases a spherical window individually for each class until *k* patterns (nearest neighbors) fall into the window. A sphere radius will change class by class. The greater the radius is, the lower probability density estimate will be according to Eq. (1).

### 2.5. Support vector network

Any hyperplane can be written in the space

where *b* is the bias. Let us present learning data of two classes as pattern vectors

If the learning data are linearly separable, two parallel hyperplanes without points between them can be built to divide the data. The hyperplanes can be given by **Figure 4**). Intuitively, it measures how good the separation between the two classes is. The points divided in this manner satisfy the following constraint:

The objective of the SVN is to find the hyperplanes that produce the maximal margin or minimum vector

subject to

Introducing the Karush-Kuhn-Tucker (KKT) multipliers

As can be seen, expression (6) is a function of *b*, and *α*. This function can be transformed into the dual form:

subject to

It can be also expressed as:

where **Q** is the matrix of quadratic coefficients. This expression is minimized now only as a function of

In SVM classification problems, a complete separation is not always possible, and a flexible margin is suggested in reference [21] that allows misclassification errors while tries to maximize the distance between the nearest fully separable points. The other way to split not separable classes is to use nonlinear functions as proposed in reference [22]. Among them, radial basis functions are recommended [23].

SVMs were originally intended for binary models; however, they can now address multi-class problems using the One-Versus-All and One-Versus-One strategies.

A gas turbine diagnostic process using the techniques above described is simulated according to the following approach.

## 3. Neural networks-based diagnostic approach

The approach described corresponds to the diagnostic stages of feature extraction and fault localization and embraces the steps of fault simulation, feature extraction, fault classification formation, making a recognition decision, and recognition accuracy estimation.

### 3.1. Fault simulation

Within the scope of this chapter, faults of engine components (compressor, turbine, combustor, etc.) are simulated by means of a nonlinear gas turbine thermodynamic model

The model determines monitored variables

### 3.2. Feature extraction

Although gas turbine monitored variables are affected by engine deterioration, the influence of the operating conditions is much more significant. To extract diagnostic information from raw measured data, a deviation (fault feature) is computed for each monitored variable as a difference between the actual and baseline values. With the thermodynamic model, the deviations Z_{i} i=1,m induced by the fault parameters are calculated for all m monitored variables according to the following expression

A random error

Deviations of the monitored variables united in an (m×1) deviation vector

### 3.3. Fault classification formation

Numerous gas turbine faults are divided into a limited number *q* of classes

To form one class, many patterns are computed by expression (10). The required parameters **Z1** uniting patterns of all classes presents a whole pattern-based fault classification. **Figure 5** illustrates such a classification by presenting four single fault classes in the diagnostic space of three deviations.

### 3.4. Making a fault recognition decision

In addition to the given (observed) pattern **Z1**, a classification technique (one of the chosen networks) is an integral part of a whole diagnostic process. To apply and test the classification techniques, a validation set **Z2** is also created in the same way as set **Z1**. The difference between the sets consists in other random numbers that are generated within the same distributions.

### 3.5. Recognition accuracy estimation

It is of practical interest to know recognition accuracy averaged for each fault class and a whole engine. To this end, the classification technique is consequently applied to the patterns of set **Z2** producing diagnoses d_{l}. Since true fault classes D_{j} are also known, probabilities of correct diagnosis (true positive rates)

## 4. Optimization of the neural networks-based diagnostic process

The structure and efficiency of a diagnostic algorithm depend on many factors and the options that can be chosen for each factor. The classification of these factors and options is given in **Figure 6** , where the factors are shown in the first line. On the basis of accumulated knowledge and experience, every research center (even a single researcher) chooses an appropriate option for each factor and develops its own diagnostic algorithm. To be optimal, this algorithm should take into account all peculiarities of a given engine, its application, and other diagnostic conditions. Thus, it is not likely that the algorithm be optimal for other engines and applications. As a result, every monitoring system needs an appropriate diagnostic algorithm.

Thus, comparing complete diagnostic algorithms does not seem to be useful. Instead, comparing options for each above factor and choosing the best option are proposed. When options of one factor are compared, the other factors (comparison conditions) are fixed forming a comparison case. To draw sound conclusions about the best option, the comparison should be repeated for many comparison cases. To form these cases, each comparison condition varies independently according to the theory of the design of experiments. Since every new condition drastically increases the volume of comparative calculations, the most significant conditions are considered first.

To perform the comparative calculations, a test procedure based on the above-described approach has been developed in Matlab (MathWorks, Inc.). For each compared option, the procedure executes numerous cycles of gas turbine fault diagnosis by the chosen technique and finally computes a diagnosis reliability indicator, which is used as a comparison criterion.

Three gas turbine engines (Engine 1, Engine 2, and Engine 3) of different construction and application have been chosen as test cases. Engine 1 and Engine 2 are free turbine power plants. Engine 1 is a natural gas compressor driver; it is presented in the investigations by its thermodynamic model and field data recorded. Engine 2 is intended for electricity production and is given by field data. Engine 3 is a three-spool turbofan for a transport aircraft; its thermodynamic model is used. The field data called hourly snapshots present filtered and averaged steady-state values recorded every hour during about one year of operation of Engine 1 and Engine 2. Since the data include periods of compressor fouling and points of washing, they are very suitable for testing diagnostic techniques.

Using the network-based approach described in Section 3 and the information about the test case engines, many investigations have been conducted to improve the diagnostic process at the stages of feature extraction, fault detection, and fault localization. The results achieved for the feature extraction stage are described in the next section.

## 5. Feature extraction stage optimization

As stated in Section 3, the deviations are useful diagnostic features. Although the thermodynamic model can be used as a baseline model for computing the deviations, it is too complex for real monitoring systems and has intrinsic inaccuracy. As mentioned in the introduction, to build a simple and fast data-driven baseline model, only neural networks, in particular the MLP, are applied. On the other hand, in the previous studies we successfully used a polynomial type baseline model. It was therefore decided [24] to verify whether the application of such a powerful approximator as the MLP instead of polynomials yields higher adequacy of the baseline model and better quality of the corresponding deviations.

Given a measured value

For one monitored variable, a complete second-order polynomial function of four arguments (operating conditions) is written as

(12) |

For all *m* monitored variables and measurements at *n* operating points, equation (12) is transformed to a linear system **Y** (*n×m*) and **V** (*n×k*) formed from these data, where *k*=15 is number of coefficients. To enhance coefficient estimates (matrix **A**), great volume of input data (*n>>k*) is involved and the least-squares method is applied.

As to the perceptron, its typical input is formed by four operating conditions, and the output consists of seven monitored variables. Hidden layer size determines a network’s capability to approximate complex functions and varies in calculations. As a result of MLP tuning, we chose 12 nodes at this layer. Thus, the perceptron structure is written as 4×12×7. Since the MLP has tan-sigmoid transfer functions, and the output varies within the interval (−1, 1), all monitored quantities are normalized.

Many cases of comparison on the simulated and real data of Engines 1 and 2 were analyzed. The MLP was sometimes more accurate at the learning step. At the validation step, the deviations computed with the MLP had a little worse accuracy for Engine 1. For Engine 2, the best MLP validation results are illustrated in **Figure 7** . As can be seen here, both polynomial deviations dTtp and network deviations dTtn reflect the fouling and washing effects equally well. However, in many other cases the polynomials outperformed. Why does the network approximate well a learning set and frequently fail on a validation set? The answer seems to be evident because of an overlearning (overfitting) effect. Due to a greater flexibility, the network begins to follow data peculiarities induced by measurement errors in the learning set and describes worse a gas turbine baseline performance for the validation set.

Although the MLP as a powerful approximation technique promised better gas turbine performance description, the results of the comparison have been somewhat surprising. No manifestations of network superiority were detected. When comparing these techniques, it is also necessary to take into consideration that an MLP learning procedure is more complex because it is numerical in contrast to an analytical solution for polynomials. Thus, a polynomial baseline model can be successfully used in real monitoring systems along with neural networks. At least, it seems to be true for simple cycle gas turbines with gradually changed performance, like the turbines considered in this chapter.

## 6. Fault detection stage optimization

As mentioned in the Introduction, the fault detection is actually based on tolerances (thresholds). However, it seems reasonable to present it as a pattern recognition problem like we do at the fault recognition stage. Classification **Figure 5** corresponds to a hypothetical fleet of engines with different faults of variable severity. To form the classification for fault detection, we can reasonably accept that the engine fleet and the distributions of faults are the same. Paper [25] explains how to use patterns of the existing classification *R* = 1. The patterns, for which a vector of true deviations (without errors) is situated inside the sphere, form the healthy engine class; the others create the faulty engine class. It is clear that the patterns (deviation vectors with noise) of these two classes are partly intersected, resulting in α- and β-errors during the detection. **Figure 8** illustrates the new classification; the intersection is clearly seen. Two variations of the new classification based on single and multiple original classes have been prepared.

Since new patterns-based classification (learning and testing sets) is ready, we can use any recognition technique to perform fault detection, and the MLP has been selected once more. It conserved sigmoid transfer functions and the hidden layer size of 12. Given that a threshold-based approach, which classifies pattern vectors according to their length, is traditionally used in fault detection, the algorithm with a distance measure (*r*-criterion) was also developed and compared with the MLP. Since the consequences of α- and β-errors are quite different (α-error is always considered as more dangerous), reduced losses

**Figure 9** shows the plots of the reduced losses versus the radius *r*. For the MLP the change of *r* was simulated by the corresponding change of the boundary radius *R* during pattern separation in the learning set. It can be seen that the introduction of an additional threshold *r*, which is different from the boundary, reduces monitoring errors for both techniques. The best results correspond to the minimums of the curves. By comparing them, we can conclude that the network (MLP) provides better results for single classes, and the techniques are equal for multiple classes. In general for all comparison cases, the MLP slightly outperforms the *r*-criterion-based technique. Thus, the perceptron can be successfully applied for real gas turbine fault detection.

## 7. Fault localization stage optimization

To draw sound conclusions about the ANN applicability for gas turbine fault localization, the comparison of the chosen networks was repeated for many comparison cases formed by independent variation of the main influencing factors: engines, operating modes, simulated or real information, and class types. In this way not only the best network is chosen but also the influence of these factors on diagnosis results is determined helping with the optimization of a total diagnostic process. For the purpose of correct comparison, the networks were tailored to a concrete task to solve.

### 7.1. Neural network tuning

We started to use ANNs applications and their tuning with the MLP [26]. The numbers of monitored variables and fault classes unambiguously determine the size of input and output layers of this network. As to the hidden layer, the number of 12 nodes was estimated as optimal using the probability

**Figure 10** illustrates other example of the tuning. Averaged probabilities computed for the PNN are plotted here against spread *b*, unique PNN tuning parameter. To determine this probability that has high precision of about ±0.001, calculations of *b* were repeated for two operating modes of Engine 1 and for two fault class types. It can be seen in the figure that the highest values of probability *b*=0.35 for the single fault type and *b*=0.40 for the multiple one.

For all networks, the value 1000 simulated patterns per fault class has been selected as tradeoff between the required computer resources and the accuracy of the probabilities

It is worth mentioning that the networks tuning is very time consuming. A tuning time can occupy up to 80% of a total investigation time, leaving 20% for the calculations related to final learning and validation of the networks.

### 7.2. Neural network comparison

The comparison of three tuned networks: MLP, RBN, and PNN, was firstly performed in reference [27], then the SVN was also evaluated. The variations of comparison conditions embraced independent changes of two engines, two operating modes, and two classification variations. The resulting probabilities **Table 1**. We can see that all networks are practically equal in accuracy for all comparison cases.

Paper [28] provides some additional results extending the comparison on the K-NN technique. The data given in **Table 2** confirm the conclusion about equal performances, now for five different techniques.

The PNN and K-NN have probabilistic output, and every pattern recognition decision is accompanied with a confidence probability. This is an important advantage for gas turbine diagnosticians and maintenance staff. It can be taken into account for choosing the best technique when mean diagnosis reliability

The results of comparison by the estimated confidence probability are illustrated in **Figure 11** , when the PNN, K-NN, and MLP errors are plotted for 100 patterns. One can see that the bias and scatter for the K-NN estimates are by far greater. As to the MLP outputs, these non-probabilistic quantities look by far more precise than the K-NN probability estimates and seem to have the same precision level as the PW-PNN estimates.

**Table 3** presents the mean estimations errors for the case of the single fault classification. The table data confirm the above conclusion on the compared techniques: The bias and standard deviation of the K-NN errors are by far greater. The table also shows that on average the MLP outputs are even more exact than the PNN probabilities. It is one more argument to apply the perceptron in real gas turbine monitoring systems.

### 7.3. Fault classification extension

In the investigations previously described, only two rigid classifications were maintained: one formed by single fault classes and the other constituted from multiple fault classes created by two fault parameters. However, the classification can vary a lot in practice even for the same engine, and it is difficult to predict what classification variation will be finally used in a real monitoring system. To verify and additionally compare the networks for different classification variations, the test procedure was modified for easily creating any new fault classification, more complex and more realistic than the classifications previously analyzed.

Twelve classification variations have been prepared and three networks: MLP, RBN, and PNN, were examined in reference [29]. These classifications have from 4 to 18 gas path and sensor fault classes, 1 to 4 fault parameters to form each class, positive and negative fault parameter changes. All the networks operated successfully for all fault classifications. **Table 4** shows the resulting averaged probabilities of correct diagnosis. Analyzing them, one can state that the differences between the networks within the same classification remain not great (except variation 6), about 0.015 (1.5%), while the difference between the variations can reach the value 0.10. Thus, these results reaffirm once more the conclusion drawn before that many recognition techniques may yield the same gas turbine diagnosis accuracy.

### 7.4. Real data-based classification

Gas path mathematical models are widely used in building fault classification required for diagnostics because faults rarely occur during field operation. In that case, model errors are transmitted to the model-based classification. Paper [30] looks at the possibility of creating a mixed fault classification that incorporates both model-based and data-driven fault classes. Such a classification will combine a profound common diagnosis with a higher diagnostic accuracy for the data-driven classes. Engine 1 has been chosen as a test case. Its real data with two periods of compressor fouling were used to form a data-driven class of the fouling. **Figure 12** illustrates simulated (without errors) and real data.

Different variations of the classification were considered and compared using the MLP. In spite of irregular distribution of real patterns, the MLP normally operated at the learning and validation steps. We also found that the perceptron trained on simulated data has 30% recognition errors when applied to real compressor fouling data. However, the use of mixed learning data allows to reduce these errors up to 3%. It was shown as well how to form a representative real fault class, which ensures minimal recognition errors.

Paper [31] presents another way to enhance gas turbine fault classification using real information. Diagnostic algorithms widely use theoretical random number distributions to simulate measurement errors. Such simulation differs from real diagnosis because the diagnostic algorithms work with the deviations, which have other error components that differ from simulated errors by amplitude and distribution. As a result, simulation-based investigations might result in too optimistic conclusions on gas turbine diagnosis reliability. To make error presentation more realistic, it was proposed in reference [31] to extract an error component from real deviations and to integrate it in fault description.

Using simulated and real data of Engine 1, six alternative variations of deviation error were integrated in the fault classification. Diagnosis was performed by the MLP, and the diagnosis reliability was estimated for each variation. Despite irregular real error distribution, the MLP successfully operated for all the variations. Experiments with error representation variations have shown what can happen when the classification formed with accurate simulated deviations is applied to classify less accurate real deviations. In that case, the diagnosis accuracy can fall from

The fault classifications with integrated real errors were used in reference [32] to compare three networks: MLP, RBN, and PNN, one more time. All networks operated well and they differed in accuracy indicators

### 7.5. Different operating conditions

Many known studies show that grouping the data collected at different engine operating modes for making a single diagnosis (multipoint diagnosis) yields higher diagnostic accuracy than the accuracy provided by traditional one-point methods. But it is of a practical interest to know how significant the accuracy increment is and how it can be explained. The diagnosis of engines at dynamic modes poses the similar questions. To make one diagnosis, this technique combines data from successive measurement sections of a transient operation mode and in this regard looks like multipoint diagnosis.

Paper [33] analyzes the influence of the operating conditions on the diagnostic accuracy by comparing the one-point, multipoint, and transient options. The MLP is used as a pattern recognition technique. In spite of significant increase of the input dimensionality, the perceptron operated well for all options.

The calculations have revealed that the process of network training has peculiarities for multipoint diagnosis. They are illustrated in **Figure 13** , which shows the plots of the perceptron error versus training epochs for the cases of one-point and multipoint diagnosis. As can be seen, the curves of the error function for the training and validation processes almost coincide for the one-point option, they slow down along with training epochs, and a total epoch number 300 is relatively large. These are indications of no over-training effect. The behavior of the perceptron applied for the multipoint diagnosis is quite different. We can see that the validation curve falls behind the training curve after the 30th epoch, this gap rapidly increases, and the training process stops earlier (108 epochs) because of the over-training phenomenon. We can conclude that the Early Stopping Option is more required here. The differences indicated above can be explained by the ratio of input data volume to the unknown perceptron parameter number. For both cases, the volume of the training set is equal to 7000 patterns, but the numbers of unknown quantities significantly differ: 144 for the first case and 1540 for the second. Consequently, in the case of multipoint diagnosis, the trained network is much more flexible and the over-training becomes possible. An increase of the reference set volume can improve the training process; however, this increase is presently limited by the computation time.

The results of the option comparison (probabilities **Table 5**. One can see that a total growth of diagnosis accuracy due to switching to the multipoint diagnosis and data joining from different steady states is significant: The diagnosis errors decrease by two to five times. The diagnosis at transients causes further accuracy growth, but it is not great. It has been found that this positive effect of the data joining is mainly explained by averaging the input data and smoothing the random measurement errors.

## 8. Conclusions

A monitoring system comprises many elements, and many factors influence the final diagnostic accuracy. The present chapter has generalized our investigations aimed to enhance this system by choosing the best option for each element. In every investigation, a diagnostic process was simulated mainly on the basis of neural networks, and we focused on reaching the highest accuracy by choosing the best network and its optimal tuning to the issue to solve. As can be seen, all the examined techniques (MLP, RBN, PNN, SVN, and K-NN) use a pattern-based classification. Such a classification can be formed from complex classes in which faults are simulated by the nonlinear thermodynamic model. Moreover, this classification allows its description by real fault displays that completely exclude a negative effect of model inaccuracy. Thus, being objects of investigation and optimization, neural networks help with enhancement of a whole monitoring system. As a result of the conducted investigations, some methods to elevate diagnostic accuracy were proposed and proven. The chapter also provides the recommendations on choosing and tailoring the networks for different diagnostic tasks. For solving many tasks, the utility of the multilayer perceptron has been proven on simulated and real data.