Open access

A Hybrid Fuzzy System for Real-Time Machinery Health Condition Monitoring

Written By

Wilson Wang

Published: 01 February 2010

DOI: 10.5772/7221

From the Edited Volume

Fuzzy Systems

Edited by Ahmad Taher Azar

Chapter metrics overview

2,759 Chapter Downloads

View Full Metrics

1. Introduction

Rotary machinery is widely used in various types of engineering systems ranging from simple electric fans to complex machinery systems such as aircraft. A reliable online condition monitoring system is very useful in industries both as a quality control scheme and as a maintenance tool. In quality control, the early detection of faulty components can prevent machinery performance degradation and malfunction. As a maintenance tool, machinery health condition monitoring enables the establishment of a maintenance program based on an early warning. This can be of great value in cases involving critical machines (e.g., airplanes, power turbines, and chemical engineering facilities), where an unexpected shutdown can have serious economic or environmental consequences.

Condition monitoring is an act of fault diagnosis by means of appropriate observations from different information carriers, such as temperature, acoustics, lubricant, or vibration. Vibration-based monitoring, however, is the most commonly used approach in industries because of its ease of measurement, which also will be used in this study.

Fault diagnosis is a sequential process involving two steps: representative feature extraction and pattern classification. Feature extraction is a mapping process from the measured signal space to the feature space. Representative features associated with the health condition of a machinery component (or subsystem) are extracted by using appropriate signal processing techniques. Pattern classification is the process of classifying the characteristic features into different categories. The classical approach, which is also widely used in industry, relies on human expertise to relate the vibration features to the faults. This method, however, is tedious and not always reliable when the extracted features are contaminated by noise. Furthermore, it is difficult for a diagnostician to deal with the contradicting symptoms if multiple features are used. The alternative is to use analytical tools (Li & Lee, 2005, Gusumano et al., 2002) and data-driven paradigms (Isermann, 1998). The latter will be utilized in this work because an accurate mathematical model is difficult to derive for a complex mechanical system, especially when it operates in noisy environments. Data-driven diagnostic classification can be performed by reasoning tools such as neural networks (Rish et al, 2005, Uluyol, 2006), fuzzy logic (Mansoori et al., 2007, Ishibuchi & Yamamoto, 2005), and neural fuzzy synergetic schemes (Wang, 2008, Uluyol et al., 2006).

Even though several techniques have been proposed in the literature for machinery condition monitoring, it still remains a challenge in implementing a diagnostic tool for real-world monitoring applications because of the complexity of machinery structures and operating conditions. When a monitoring system is used in real-time industrial applications, the critical issue is its reliability. Unreasonably missed alarms (i.e., the monitoring system cannot pick up existing faults) and false alarms (i.e., the monitoring system triggers an alarm because of noise instead of real faults) will seriously mitigate its validity. To tackle these challenges, the objective of this research work is to develop a new technique, an integrated classifier, for real-time condition monitoring in, especially, gear transmission systems. In this novel classifier, the monitoring reliability is enhanced by integrating the information of the object’s future states forecast by a multiple-step predictor; furthermore, the diagnostic scheme is adaptively trained by a novel recursive hybrid algorithm to improve its convergence and adaptive capability.

This chapter is organized as follows: Section 2 describes integrated classifier, whereas the multiple-step predictor and monitoring indices are described in Section 3. Section 4 discusses the hybrid online training algorithm. In Section 5, the viability of the proposed integrated classifier is verified by experimental tests corresponding to different gear conditions.


2. Diagnostic system

The diagnostic classifier is used to integrate the selected features obtained by implementing appropriate signal processing techniques. The purpose is to make a more positive assessment of the health condition of the mechanical component (or subsystem) of interest. The diagnostic reliability in this suggested classifier will be enhanced by implementing the future (multi-step-ahead) states of the object’s conditions. The forecasting in this integrated classifier is performed for input variables so as to make it easier to track the error sources in diagnostic operations.

Figure 1.

The initial membership functions (MFs) for the input state variables.

The developed classifier is an NF paradigm which is able to facilitate the incorporation of diagnostic knowledge from expertise and to extract new knowledge in operations by online/offline training. The diagnostic classification is performed by fuzzy logic (Jang 1993), whereas an adaptive training algorithm, as discussed in Section 4, is utilized to fine-tune the fuzzy system parameters and structures. The conditions of each object (machinery component or subsystem) are classified into three categories: healthy (C 1), possible (initial) damage (C 2), and damage (C 3), respectively. {x 1, x 2, …, x n} are the input variables at the current time step. Three membership functions (MFs), small, medium, and large, are assigned to each input variable with the initial states as shown in Fig. 1 where the fuzzy completeness (or the minimum fuzzy membership grade) is at 50%.

The diagnostic classification, in terms of the diagnostic indicator y, is formulated in the following form:

j : If ( x 1 is A 1 j ) a n d ( x 2 is A 2 j ) and ... and ( x n is A n j ) ( y S j with w j ) E1

where A ij are MFs; i = 1, 2, …, n, j = 1, 2, …, m, m denotes the number of rules; S j represents one of the states C 1, C 2 or C 3, depending on the values of the diagnostic indicator.

When multiple features (input indices) are employed for diagnostic classification operations, the contribution of each feature combination (association) to the final decision depends, to a large degree, on the situation under which the diagnostic decision is made. Such a contribution is characterized by a weight factor w j which is related to the feature association in each rule. The initial values of these rule weights are chosen to be unity; That is, all input state variables have initially assumed to have identical importance or robustness to the overall diagnostic output.

Similarly, the diagnostic classification based on the predicted monitoring indices, { x 1 , x 2 , …, x n }, is formulated as:

j : If ( x 1 is A 1 j ) a n d ( x 2 is A 2 j ) and ... and ( x n is A n j ) ( y S j with w j ) E2

where y is the diagnostic indicator based on forecast input variables.

The number of rules is associated with the diagnostic reasoning operations of input state variables. In general, if all monitoring indices are small, then the object is considered healthy (C 1). Otherwise, the object is possibly damaged. In this case, the diagnostic classification indicator y represents faulty condition only. Different feature association (rule) corresponds to a different confidence grade w j in diagnosis. Fig. 2 schematically shows the network architecture of this integrated classifier. Unless specified, all the network links have unity weights.

The input nodes in layer 1 transmit the monitoring indices {x 1, x 2, …, x n} or their forecast future values { x 1 , x 2 , …, x n } to the next layer. These two sets of monitoring indices are input to the network and processed separately.

Each node in layer 2 acts as a MF, which can be either a single node that performs a simple activation function or multilayer nodes that perform a complex function. The nodes in layer 3 perform the fuzzy T-norm operations. If a product operator is used, the firing strength of rule j is

η j = i = 1 n A i j ( x i ) E3
η j = i = 1 n A i j ( x i ) E4

where A i j ( ) denote MF grades.

Figure 2.

The network architecture of the proposed integrated classifier.

Defuzzification is undertaken in layer 4. By normalization, the faulty diagnostic indicator will be

y = m η j w j m η j E5

Similarly, the fault diagnostic indicator based on forecast inputs will be

y = m η j w j m η j E6

The states of the diagnostic indicator y (or y’) are further classified into three categories:

{  If   0 y 0.33        H e a l t h y  ( C 1 )  If   0.33 y 0.66        P o s s i b l y   d a m a g e d  ( C 2 )  If   0.66 y 1        D a m a g e d  ( C 3 ) E7

The final decision regarding the health condition of the object of interest is made by:

a ) If ( y C 1 and y C 1 ) or ( y C 2 and y C 2 ) then  ( the object is healthy C 1 ) b ) If ( y C 3 and y C 3 ) or ( y C 2 and y C 3 ) then  ( the object is damaged C 3 ) c )   Otherwise ,   ( the object is possibly damaged C 2 E8

3. Prediction of monitoring indices

3.1. Monitoring indices

In general, most machinery defects are related to transmission systems, mainly for gears and bearings. In this work, gears are used as an example to illustrate how to apply the proposed integrated classifier for machinery condition monitoring. In operations, the fault diagnosis of a gear train is conducted gear by gear. Because the measured vibration is an overall signal contributed from various vibratory sources, the primary step is to differentiate the signal specific to each gear of interest by using a synchronous average filter (Wang et al., 2001). By this filtering process, the signals which are non-synchronous to the rotation of the gear of interest (e.g., those from bearings, shafts and other gears) are filtered out. As a result, each gear signal is computed and represented in one full revolution, called the signal average which will be used for advanced analysis by other signal processing techniques.

Several techniques have been proposed in the literature for gear fault detection. However, because of the complexity in the machinery structures and operating conditions, each fault detection technique has its own advantages and limitations, and is efficient for some specific application only (Wang et al., 2001). Consequently, the selected features for fault diagnostics should be robust, that is, sensitive to component defects but insensitive to noise (i.e., the signal not carrying information of interest). In this case, three features from the information domains of energy, amplitude, and phase are employed for the diagnosis operation:

1. Wavelet energy function, using the overall residual signal which is obtained by bandstop filtering out the gear mesh frequency f R N and its harmonics, where f R is the rotation frequency (in Hz) of the gear of interest and N is the number of teeth of the gear;

2. Phase demodulation (McFadden, 1986), using the signal average;

3. Beta kurtosis, using the overall residual signal.

The details of these reference functions are listed in Appendix A.

Based on the derived reference functions, the monitoring indices are determined to quantify the feature characteristics. Each index is a function of two variables, magnitude and position. The magnitude of an index is determined as the normalized relative maximum amplitude value of the corresponding reference function; the position is where the maximum amplitude is located. Usually, the maximum amplitude positions in these reference functions do not coincide exactly due to the phase lags in signal processing. Based on simulation and test observations, an influence window is defined as a period of four tooth periods in this case. Correspondingly, if all indices are located within one influence window, one set of inputs {x 1, x 2, x 3} is given to the classifier. Otherwise, if three indices are not within one influence window, the object has no fault or has more than one defect; more than one set of inputs should be provided to the classifier. For example, if x 3 does not fall within the influence window determined by x 1 and x 2, two sets of inputs will be given to the monitoring classifier: The first input vector is { x 1, x 2, x 3}, where x 3 is computed over the influence window determined by both x 1 and x 2; The second input vector is { x 1, x 2, x 3}, where x 1 and x 2 are determined over the influence window around x 3.

Fig. 3 illustrates an example of the reference functions corresponding to a healthy gear with 41 teeth. Fig. 3a shows part of the original vibration signal measured from the experimental setup to be illustrated in Section 5. Fig. 3b represents the signal average of the gear of interest, which is obtained by synchronous average filtering; each wave represents a tooth period. Figs. 3c to 3e represent the resulting reference functions of the wavelet energy, beta kurtosis, and phase modulation, respectively. It is seen that no specific irregularities can be found from these reference functions for this healthy gear.

Figure 3.

Processing results for a healthy gear: (a) Part of the original vibration signal; (b) Signal average; (c) Wavelet reference function; (d) Beta kurtosis reference function; (e) Phase modulation reference function.

Fig. 4 shows the processing results corresponding to a cracked gear with 41 teeth. It is impossible to recognize the gear damage from the original signal (Fig. 4a). A little signature irregularity can be recognized around 200 in the signal average graph (Fig. 4b). However, this gear damage can be identified clearly from the proposed reference functions (Figs. 4c to 4e). Although the maximum peak positions are little different from one graph to another, these peaks occur within one influence window (four tooth periods in this case).

Figure 4.

Processing results for a cracked gear: (a) Part of the original vibration signal; (b) Signal average; (c) Wavelet reference function; (d) Beta kurtosis reference function; (e) Phase modulation reference function.

Fig. 5 illustrates the processing results for a chipped gear (with 41 teeth). Some signature irregularity can be recognized around 200 in the signal average graph (Fig. 5b) due to this gear tooth damage. However, this defect can be clearly identified from other three reference functions (Figs. 5c to 5e), and the monitoring indices are located within one influence window (four tooth periods).

Figure 5.

Processing results for a chipped gear: (a) Part of the original vibration signal; (b) Signal average; (c) Wavelet reference function; (d) Beta kurtosis reference function; (e) Phase modulation reference function.

3.2. Forecasting of the monitoring indices

System state forecasting is the process to predict the future states in a dynamic system based on available observations. Several techniques have been suggested in the literature for time series forecasting. The classical methods are the use of stochastic models (Chelidze & Cusumano, 2004), which are usually difficult to derive for mechanical systems with complex structures. More recent research on time series forecasting has focused on the use of data-driven paradigms, such as neural networks and neural fuzzy schemes (Tse & Atherton, 1999, Pourahmadi, 2001). In this work, the multi-step-ahead prediction of the input variables (indices) is performed by the use of a predictor as suggested in (Wang & Vrbanek, 2007), whose effectiveness has been verified: it can capture and track the system’s dynamic characteristics quickly and accurately, and it outperforms to other related classical forecasting schemes.

Given a monitoring index x 1 , or x 2 , or x 3 , if { v 0    v r    v 2 r    v 3 r } represent its current and previous three states with an interval of r steps, the r-step-ahead state v ' + r is estimated by a TS-1 fuzzy formulation:

j : If ( v 0 is B 0 k ) and ( v r is B 1 k ) and ( v 2 r is B 3 k ) then v ' + r = c 0 j v 0 + c 1 j v r + c 2 j v 2 r + c 3 j v 3 r + c 4 j E9

where B are MFs, c i j are constants, i = 0, 1,..., 3; j = 1, 2,..., 16; k = 1, 2. Fig. 6 illustrates its fuzzy reasoning architecture.

Figure 6.

The network architecture of the multi-step predictor.

This NF predictor has a weighted feedback link to each node in layer 2 to deal with time explicitly as opposed to representing temporal information spatially. The context units copy the activations of output nodes from the previous time step, and allow the network to memorize clues from the past, which forms a context for current processing. This function of recurrent networks is valuable for predictors with limited and step inputs (i.e., r 1 ), to provide more information to the network so as to improve forecasting accuracy. If two sigmoid MFs are assigned to each input variable, the node output at the kth process step will be

μ B i j ( V i r ) = 1 1 + exp [ a i j ( V i r b i j ) ] E10
V i s = v i r ( k ) + w i m μ B i j ( v i r ( k 1 ) ) = v i s ( k ) + w i m 1 + exp [ a i j ( v i r ( k 1 ) b i j ) ] E11

where m = 1, 2; i = 0, 1,..., n. v i r ( k ) and v i r ( k 1 ) are, respectively, the input v i r at the kth and (k-1)th time steps, where k = 1, 2,..., K, K is the total number of time steps (or training data sets). If a max-product operator is applied in layer 3, and a centroid method is used for defuzzification in layer 5, by some related fuzzy operations, the predicted output v ' + r can be determined by

v ' + r = j = 1 16 μ ¯ j ( c 0 j v 0 + c 1 j v r + c 2 j v 2 r + c 3 j v 3 r + c 4 j ) E12

where μ ¯ j = μ j j = 1 16 μ j denotes the normalized rule firing strength, and μ j is the firing strength of the jth rule.

The fuzzy system parameters are trained by using a hybrid algorithm: that is, the premise parameters in the MFs B are trained by a real-time recurrent training algorithm whereas the consequent parameters c i j in (8) are updated by least squares estimate (LSE). Details about the training algorithm can be found in (Wang, 2008).


4. Online training of the diagnostic classifier

The developed diagnostic classifier should be optimized in order to achieve the desired input-output mapping. Several training algorithms have been proposed in the literature for NF-based classification schemes (Figueiredo et al., 2004, Castellano et al., 2004). In offline training, representative data should cover all of the possible application conditions (Korbicz et al., 2004); such a requirement is usually difficult to achieve in real-world machinery applications because most machinery operates in noisy and uncertain environments. Furthermore, machinery dynamic characteristics may change suddenly, for instance, just after repair or regular maintenance. Therefore, an adaptive training algorithm is preferred in time-varying systems to accommodate different machinery conditions (Wang & Lee, 2002). In this case, a hybrid method based on recursive Levenberg-Marquet (LM) and LSE will be adopted to train the integrated classifier. Such a training approach possesses randomness that may help to escape certain local minima.

4.1. Training the premise MF parameters

The nonlinear premise MF parameters will be trained by adopting the recursive LM method. The general LM algorithm possesses quadratic convergence close to a minimum. Its convergence property is still reasonable, even if the initial estimates are poor. In addition, the LM algorithm has been proven globally convergent in many applications by properly choosing the step factors.

For a training data pair {   x ( p ) ,    d ( p ) } , the inputs are x ( p ) = {   x 1 ( p ) ,    x 2 ( p ) ,    x 3 ( p ) } , p = 1, 2, …, P; d ( p ) are the desired outputs {0, 0.5, 1} as x ( p ) belongs to C 1 , C 2 and C 3 , respectively. The error function with respect to adjustable MF parameters θ p at the current time instant, p, is

E ( θ p ) = 1 2 p = 1 P [ y p ( θ p ) d p ] 2 = 1 2 p = 1 P r p 2 ( θ p ) =   r p T ( θ p ) r p ( θ p ) E13

where y p(θ p) is the pth output determined by Eq. (5). p = 1, 2, …, P; d p is the desired output.

To simplify expressions, the variable θ p is dropped in the related terms in this section. r p is the error vector that can be either linear or nonlinear. By taking the Taylor series expansion and neglecting higher order terms,

θ p + 1 θ p + λ ( J p T J p + η I ) 1 J p T r p = θ p + ( 1 α ) H p 1 J p T r p E14

where J p RN×Z denotes the Jacobian matrix; Z is the dimension (or the number of adjustable parameters) of θ p ; H p RZ×Z is the modified Hessian matrix; I RZ×Z is an identity matrix; λ = 1 α is the learning rate, and α is the forgetting factor.

The Hessian matrix can be expressed as

H p = α H p 1 ( 1 α ) ( J p T J p + η I ) E15

In implementation, instead of computing the Z × Z matrix η I at each time step, a diagonal element is added at each time step

H p = α H p 1 ( 1 α ) ( J p T J p + Z η Λ ) E16

where Λ RZ×Z has only one nonzero element located at { p   mod ( Z ) + 1 } diagonal position:

Λ i i = { 1 ,      i f    i = { p    mod ( Z ) + 1 } 0 ,      o t h e r w i s e E17

Correspondingly, (15) can be rewritten as

H p = α H p 1 ( 1 α ) [ U V 1 U T ] E18

where U is a Z × 2 matrix whose first column is J p and second column consists of a Z × 1 vector with one element of 1 at the position of { p   mod ( Z ) + 1 }

U T ( θ p ) = [              J p T 0       0    1    0       0 ]  , and V 1 = [ 1       0 0     Z η ]   E19

The computation of H p 1 in (13) is time consuming, and is not suitable for real-time applications. To solve this problem, Eq. (13) is rewritten as

θ p + 1 = θ p + ( 1 α ) H p 1 J p T r p = θ p + ( 1 α ) { ( α H p 1 ) 1 ( α H p 1 ) 1 ( 1 α ) U × [ V + U T ( α H p 1 ) 1 ( 1 α ) U ] 1 U T ( α H p 1 ) 1 } J p T r p E20

Based on the matrix inversion formula and by some manipulations, Eq. (18) becomes

θ p + 1 = θ p + ( 1 α ) { α H p 1 + ( 1 α ) U V 1 U T } 1 J p T r p E21

The recursive LM algorithm can be represented by

θ p + 1 = θ p + Φ p J p r p E22
Φ p = 1 α [ Φ p 1 Φ p 1 U U T Φ p 1 α V + U T Φ p 1 U ] E23

The denominator α V + U T Φ p 1 U is a matrix with dimension 2 × 2 ; its inverse computation is simple, and can be implemented for real-time applications. θ 0 = 0. Φ p is a covariance matrix with initial condition Φ 0 = ρ I , where ρ is a positive quantity and I is an identity matrix.

By simulation tests with the requirements of the recognition rate ≥ 80%, reasonable training speed and accuracy, the following initial values are given to the related parameters in this study: η = 0.01 with tested range of η [ 0.001 ,    10 ] ; α = 0.995 with tested range of α [ 0.95 ,   1 ] ; ρ = 10 3 with tested range of ρ [ 10 2 10 5 ] .

4.2. Implementation of the hybrid training method

In implementation, inside each training epoch, the nonlinear MF parameters in the classifier are optimized in the backward pass by using a recursive LM method, whereas consequent linear rule weights are updated by LSE in the forward pass. On the other hand, after training or real applications over some time period, if the updated rule weights wj are sufficiently small (e.g., wj< 0.01), the contribution of the related rule to the final classification operation can be neglected, and that rule can be removed from the rule base.


5. Performance evaluation

5.1. Experimental setup

Fig. 7 shows the experimental setup used in this study to verify the performance of the proposed integrated classifier.

The apparatus is anchored onto a massive concrete block. It consists of a 3-HP AC drive motor and a gearbox. The motor rotation is controlled by a speed controller which allows tested gears operating in the range of 20 to 4200 rpm. An optical sensor provides a one- pulse-per-revolution signal which is used as the reference for the time synchronous average

Figure 7.

The experimental setup: 1-speed controller, 2-motor, 3-optical sensor, 4-gearbox, 5-load controller, 6-loading system, 7-sensors.

filtering. The gearbox consists of two pairs of spur or helical gears. The shafts in the gearbox are mounted to the housing by rolling element bearings. The load is provided by a magnetic loading system which is connected to the output shaft. The speed of the drive motor and the load are adjusted to simulate different speed/load operating conditions. The vibration is measured using ICP accelerometers mounted on the gearbox housing along different orientations. After being properly preconditioned, the collected signals are fed to a computer for further processing.

5.2. Performance evaluation

To verify the viability of the proposed classifier, five gear cases are tested in this study as represented in Fig. 8:

Figure 8.

Gear conditions tested: (a) healthy gear, (b) cracked gear; (c) chipped gear.

a. healthy gears (C1);

b. gears having a tooth crack with 15% (C2) and 50% (C3) tooth root thickness;

c. gears having a chipped tooth with 10% (C2) and 40% (C3) tooth surface area removed.

These demonstrated faults belong to localized gear defects. From the signal property standpoint, when a localized fault occurs, some high-amplitude pulses will be generated due to impacts, which are relatively easier for a signal processing technique to recognize. When a localized fault propagates towards a distributed defect, the overall energy of the fault will increase, but it often becomes more wideband in nature and difficult to detect in the presence of the other vibratory components of the machine. This example identifies a characteristic of currently used fault detection techniques: It is usually easier to detect a distinct low-level narrowband tone than a high-level wideband signal in the presence of other signals or noises. Even though a distributed defect, such as pitting and wear, is initiated from a localized fault which is detectable as an incipient defect, most currently available vibration-based signal processing techniques cannot effectively detect an advanced distributed fault which, however, can be diagnosed based on other information carriers, such as acoustic signals.

To make a comparison, the diagnostic results from the following three classifiers are also listed:

  1. A pure fuzzy system with a similar reasoning architecture as in Fig. 2 but without the use of predictors. The rule weight factors are chosen as those in the integrated classifier after initial training.

  2. Classifier-1: An NF classifier with a similar reasoning architecture as in Fig. 2 but without predictors. Its MF parameters are trained by a gradient-LSE algorithm.

  3. Classifier-2: Same as Classifier-1, but trained by the hybrid algorithm of the recursive LM and LSE.

Given the network architectures, the initial parameters of three adaptive classifiers can be primarily trained by using some data sets collected in previous tests on the same test apparatus, or be initialized by experience. Then these classifier parameters are optimized in the following online training processes.

During online tests, motor speed and load levels are randomly changed to simulate general and unusual machinery operating conditions. The tests are conducted under load levels from 0.5 to 3 hp, and motor speeds from 50 to 3600 rpm.

In online monitoring, based on test schedule and load/speed change frequency, the monitoring time-interval is set at 15 minutes; that is, all the monitoring schemes are applied automatically every 15 minutes for condition monitoring operations. Three-steps-ahead predictors (i.e., r = 3) are used in the integrated classifier. The selection of data size depends on noise reduction requirement; usually the data for the gear with the lowest speed should cover more than 100 revolutions. For example, if the slowest gear speed in the gearbox is 1200 rpm, the data acquisition process takes at least 5 seconds (15 seconds in this case). The monitoring is performed gear by gear. Three examples corresponding to healthy, cracked and chipped gears (all having 41 teeth) have been illustrated in Figs. 3 to 5, respectively.

Each healthy gear condition is tested over 24 hours whereas each faulty gear condition is tested over 50 hours. In total, 386 data pairs are recorded for testing purpose. Table 1 summarizes the classification performance by different diagnostic schemes.

Table 1.

Comparison of the diagnostic results from different diagnostic schemes. M.A.-Missed Alarms, F.A.- False Alarms.

The fuzzy classifier records 15 missed alarms and 37 false alarms, with an overall reliability of 85.3%. Its relatively poor diagnostic performance is mainly due to the lack of learning capability. In addition, fixed or human-determined system parameters are subject to variations and are rarely optimal in terms of reproducing the desired classification outputs, which results in the fuzzy classifier not being optimized under different operating conditions.

Classifier-1 records 7 missed alarms and 21 false alarms, with an overall reliability of 92.5%. One difference between this NF system and the fuzzy classifier is related to the rule weight factors. Each signal processing technique (and the resulting feature) has a limited capability in fault detection. Even if the firing strengths of two fuzzy if-then rules are identical, their diagnostic reliabilities may be different under different machinery conditions. Therefore, rule weights play an important role in the diagnostic classification operations.

Classifier-2 records 7 missed alarms and 17 false alarms, with an overall reliability of 93.6%. The main difference between Classifier-2 and Classifier-1 is related to training algorithms. It is seen that the recursive LM algorithm is superior to the gradient method in convergence, and has the randomness to reduce the chance of possible trapping due to local minima. In addition, each rule has its own decision (mapping) space, whereas the MFs and the rule weights are directly associated with the characteristics of the decision space. The efficient optimization of classifiers can adjust the boundary characteristics of the decision space so as to reduce misclassifications. This property is especially important for classifier with coarse fuzzy partitions.

The developed integrated classifier generates 3 missed alarms and 7 false alarms, with an overall reliability of 97.6%. Compared with Classifier-2, the integrated classifier can enhance the classification accuracy by properly implementing the future states of the classifier. It follows that adaptively fine-tuning the fuzzy parameters is necessary to enhance the approximation of the mapping from the observed symptoms to the underlying faults. In addition, the fault severity can be recognized because, to some extent, the greater the fault, the more pronounced the feature modulation, and the larger the monitoring indices will become.

The developed integrated diagnostic classifier provides a robust problem solving framework. Machinery conditions vary dramatically in real-world applications, and new system conditions may occur under different circumstances. With the help of an adequate learning algorithm, new information can be extracted from online training, and the diagnostic knowledge base can be expanded automatically to accommodate different machinery conditions.

In general, deterioration history of most machinery components follows a “U curve” as illustrated in Fig. 9. It consists of four periods: the run-in stage (I), the normal operation period (II), initial (III) and advanced (IV) failure stages, respectively. Such a trend characteristic is easy for a powerful NF predictor to catch up. If a false alarm is generated during the healthy period II, the false alarm is induced due to noise instead of real defect. Based on the forecast result, the diagnostic state should lie in period III (or initial defect). However random noise will disappear in the following processing steps, and the diagnostic indicator should return to period II (or healthy). Correspondingly, this misclassification can be prevented by the integrated classification /forecasting information. On the other hand, if an object is damaged, its diagnostic indicator should lie in period III (or IV). If a misclassification occurs, or the diagnostic indicator falls in period II, the forecast information will be contradictory to that from the classifier. Comprehensive analysis in Eq. (7) can avoid this possible missed alarm so as to improve fault diagnostic reliability. In both aforementioned examples, classifier will be updated to accommodate such a noise in the following monitoring applications.

Figure 9.

The deterioration trend of a machinery component.


5. Conclusions

In this paper, an integrated classifier is developed for gear fault diagnostics. The purpose is to provide industries with a more reliable monitoring tool to prevent machinery system performance degradation, malfunction, and sudden failure. The classifier can integrate different features for a more positive assessment of the object’s health condition. The diagnostic reliability is improved by properly integrating the future states of the gear, which are forecast by multi-step predictors. An online hybrid training technique based on a recursive LM and LSE is adopted to improve the classifier’s convergence and adaptive capability to accommodate different machinery conditions. The viability of the new integrated classifier has been verified by experimental tests corresponding to different gear conditions.

On the other hand, it should be stated that although satisfactory results have been achieved based on the developed integrated classifier, its network architecture is relatively complex which may not be easy for implementation for some real-world applications. Future research is to develop novel evolving fuzzy or neuro-fuzzy classification schemes for more effective diagnostic operations. New training algorithms will be proposed to further improve the training convergence. The proposed techniques will also be employed for real-world industrial applications in vehicles, wind turbines, and manufacturing facilities.



This work was partly supported by MC Technologies Inc. and Materials and Manufacturing Ontario in Canada.


  1. 1. Chelidze D. Cusumano J. 2004 A dynamical systems approach to failure prognosis, Journal of Vibration and Acoustics, 126 1-7.
  2. 2. Castellano G. Fanelli A. Mencar C. 2004 An empirical risk functional to improve learning in a neuro-fuzzy classifier, IEEE Transactions on Systems, Man, Cybernetics, Part B, 34 725 31 .
  3. 3. Figueiredo M. Ballini R. Soares S. Andrade M. Gomide F. 2004 Learning algorithms for a class of neurofuzzy network and applications, IEEE Transactions on Systems, Man, Cybernetics, Part C, 34 293 -301.
  4. 4. Gusumano J. Chelidze D. Chatterjee A. 2002 Dynamical systems approach to damage evolution tracking, part 2: Model-based validation and physical interpretation, Journal of Vibration and Acoustics, 124 258-264.
  5. 5. Ishibuchi H. Yamamoto Y. 2005 Rule weight specification in fuzzy rule-based classification systems, IEEE Transactions on Fuzzy Systems, 13 428 435 .
  6. 6. Isermann R. 1998 On fuzzy logic applications for automatic control, supervision, and fault diagnosis, IEEE Transactions on Systems Man, Cybernetics, Part A,, 28 221 235 .
  7. 7. Jang J. 1993 ANFIS: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man, Cybernetics, 23 665 685 .
  8. 8. Korbicz J. Koscielny J. Kowalczuk Z. Cholewa W. 2004 Fault Diagnosis: Models, Artificial Intelligence, Applications, Springer.
  9. 9. Li L. Lee H. 2005 Gear fatigue crack prognosis using embedded model gear dynamic model and fracture mechanics, Mechanical Systems and Signal Processing, 20 836 846 .
  10. 10. Mansoori E. Zolghadri M. Katebi S. 2007 A weighting function for improving fuzzy classification systems performance, Fuzzy Sets and Systems, 158 583 591 .
  11. 11. Mc Fadden P. 1986 Detecting fatigue cracks in gears by amplitude and phase demodulation of the meshing vibration, Journal of Vibration, Acoustics, Stress, and Reliability in Design, 108 165 -170.
  12. 12. Pourahmadi M. 2001 Foundation of Time Series Analysis and Prediction Theory, John & Sons.
  13. 13. Rish I. Brodie M. Ma S. Odintsova N. Beygelzimer A. Grabarnik G. Hernandez K. 2005 Adaptive diagnosis in distributed systems, IEEE Transactions on Neural Networks, 16 1088 1109 .
  14. 14. Tse P. Atherton D. 1999 Prediction of machine deterioration using vibration based fault trends and recurrent neural networks, Journal of Vibration and Acoustics, 121 355 -362.
  15. 15. Uluyol O. Kim K. Nwadiognu E. 2006 Synergistic use of soft computing technologies for fault detection in gas turbine engines, IEEE Transactions on Systems, Man, Cybernetics, Part C, 36 476 484 .
  16. 16. Wang J. Lee L. 2002 Self-adaptive neuro-fuzzy inference systems for classification applications, IEEE Transactions on Fuzzy Systems, 10 790 802 .
  17. 17. Wang W. 2008 An intelligent system for machinery condition monitoring, IEEE Transactions on Fuzzy Systems, 16 1 110 122 .
  18. 18. Wang W. Ismail F. Golnaraghi F. 2001 Assessment of gear damage monitoring techniques using vibration measurements, Mechanical Systems and Signal Processing, 15 905 922 .
  19. 19. Wang W. Vrbanek J. 2007 A multi-step predictor for dynamic system property forecasting, Measurement Science and Technology, 18 3673 3681 .

Written By

Wilson Wang

Published: 01 February 2010