Bearing Fault Diagnosis Using Information Fusion and Intelligent Algorithms

Rotating machinery is very common in industrial systems, and it plays an important role in industrial development and economic development. With the rapid advancement in industry, rotating machinery is becoming more and more complex and require constant attention. Although the reliability and robustness of rotatingmachinery also have been improving, some occasional failure events of components often lead to unexpected downtime while resulting in huge losses. And rolling element bearing is often at the heart of these rotating machinery which suffers from fault more frequently. These faults may cause the machine to break down and decrease its level of performance [6]. So, it is urgent to diagnose the incipient errors exactly in these bearings.


Introduction
Rotating machinery is very common in industrial systems, and it plays an important role in industrial development and economic development. With the rapid advancement in industry, rotating machinery is becoming more and more complex and require constant attention. Although the reliability and robustness of rotating machinery also have been improving, some occasional failure events of components often lead to unexpected downtime while resulting in huge losses. And rolling element bearing is often at the heart of these rotating machinery which suffers from fault more frequently. These faults may cause the machine to break down and decrease its level of performance [6]. So, it is urgent to diagnose the incipient errors exactly in these bearings.
In traditional fault diagnosis, a single sensor is always used to get the operation conditions of several machine components. The collected signal involves many correlated features [33]. During operating process, the machine set can generate many kinds of signals. And those approaches based on the vibration signal analysis are advantageous because of their visual feature, easy measurability, high accuracy and reliability [34]. Fault diagnosis using raw vibration signals, a wide variety of techniques have been introduced in recent years. There are mainly including signal processing methods and intelligent systems application. Signal processing methods are traditional methods which are still common used, such as wavelet and wavelet packet methods [23][24][25], empirical mode decomposition [15,35], time-frequency distributions [7], blind source separation [29]. While intelligent system approaches for fault diagnosis are including artificial neural networks (ANNs) [36], support vector machines (SVMs) [33], adaptive neuro-fuzzy inference system (ANFIS) [19] and fuzzy technique [28], etc.. These approaches are based on one data source or individual decision system, and many researchers have realized and shown that an individual decision system with a single data source can only acquire a limited classification capability which may not be enough for a particular application [22]. So, it is necessary to combine multiple decision systems to carry on failure diagnosis.
Multi-sensor information fusion is an emerging interdisciplinary beginning in the military field, and it has already been successfully applied in many different areas. In the field of industrial equipment fault diagnosis, multi-source information fusion technology application is still in its early stage. Multi-sensor information fusion is divided into three levels: sensor level, feature level and decision level. And multiple classifier ensemble approach belongs to decision level information fusion. In the recent years, the use of multiple classifiers has gained a lot of attention and researches have continuously showed the benefits of using multiple classifiers to solve complex problems [4]. In contrast, the feature-level fusion has not probably received the amount of attention it deserves [32].
By using information fusion theory, this chapter will introduce some bearing fault diagnosis approaches. And these methods can divide into two categories: fault diagnosis based on feature-level fusion [11] and fault diagnosis based on decision-level fusion [14]. In the proposed fusion methods for bearing fault diagnosis, some intelligent algorithms are used for feature dimension reduction or pattern recognition. The feature-level fusion approach for bearing fault diagnosis is using gene expression programming (GEP), while the decision-level fusion approach using multiple classifier ensemble method. And the decision-level fusion approach is based on the new bearing fault diagnosis method [12] which uses empirical mode decomposition (EMD) and fractal feature parameter classification.

Bearing fault diagnosis using fractal feature parameter classification
Faulty and normal machine conditions are always treated as classification problems based on learning pattern from empirical data modeling in complex mechanical processes and systems [31]. In this approach, a general framework for applying classification methods to fault diagnosis problems includes two steps: representative feature extraction and pattern classification. Feature extraction is a mapping process from the measured signal space to the feature space. Representative features which demonstrate the information of fault are extracted from the feature space. Pattern classification is the process of classifying the extracted features into different categories by geometric, statistic, neural or fuzzy classifiers. And recently, the development of artificial intelligence techniques has led to their application in fault diagnosis area. Meanwhile, artificial neural networks (ANNs) and support vector machines (SVMs) have been successfully applied to the intelligent fault diagnosis of mechanical equipment [27].
In practice, the classical approach is not always reliable when the extracted features are contaminated by noise. And most intelligent fault diagnosis approaches are complex, especially in solving multiple fault diagnosis problems. In this section, a novel, simple, fast and reliable intelligent method for solving multiple fault diagnosis problem will be proposed. And this approach is based on EMD and fractal feature parameter extraction.

Methodology
Fractal dimension is considered right from its invention [21] to be a good parameter to characterize time sequences of values of natural variables. And a simple, fast and accurate method for calculating the fractal dimension of data's time sequences was presented by Sy-Sang liaw and Feng-Yuan Chiu [20]. This method considers that a time sequence of 2 M + 1 values is separated by a constant time interval which is well fitted by a fractal function f (t) in the period [0, T]. Then, calculating the fractal dimension D of f (t) by using the known values of f (t) at t j = jT/2 M . To achieve this aim, Liaw and Chiu first defined L k ( f ), the piecewise linear interpolation of level k(k = 0, 1, 2, ..., M), to f (t) as the union of the line segments connecting the points [t j , f (t j )] and [t j+1 , f (t j+1 )], where t j = jT/2 k , j = 0, 1, 2, ..., 2 k (see Figure 1). And then they checked out how poor the interpolation function L k ( f ) is relative to the next level of interpolation L k+1 ( f ). The error of L k ( f ) is defined as the sum of the absolute value of the differences of L k ( f ) and L k+1 ( f ) at all t j = jT/2 k+1 ≡ jε k : Liaw and Chiu [20] found that the value Δ k is proportional to (ε k ) 1−D when k is large enough.
to a function f (t) (grey) at level 0 (dotted), 1 (dashed), and 2 (solid). Δ k (thick solid) denotes the error of the kth level interpolation with respect to the k + 1 level [20] Thus, the fractal dimension D of f (t) can be obtained from the slope s of the log-plot of Δ k with respect to the level k by D = 1 + s/log2 for large enough k values.
In this bearing fault diagnosis method, raw vibration signal will be seen as a time sequences of data. Raw vibration signal is often heavily clouded by various noises due to the compounded effect of other machine elements' interferences and background noises presenting in the measuring device [2]. So, EMD is used to analysis raw vibration signal to filter noise before extracting its fractal feature. As discussed by Huang et al. [10], the EMD method is designated to deal with non-stationary and nonlinear signals. This method is based on the simple assumption that any data consists of different simple intrinsic modes of oscillations. Using the EMD method, complicated signals can be decomposed in a finite set of intrinsic mode functions (IMFs). Each IMF should meet the following two conditions: (1) in the whole data set of a signal, the number of extreme and the number of zero crossings must either equal or differ at most by one, and (2) at any time point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero. Assume x(t) is a vibration signal, and its empirical mode decomposition process can be described by following steps: Step 1. Initialize: r 0 (t) = x(t), i = 1.
Step 2.2. Determine all the maximal values, minimal value points of h j−1 (t) and fit all extreme points into the upper and lower envelope of the original signal with the cubic spline line.
Step 2.3. Determine the mean value of the upper and lower envelope of h j−1 (t), designated as m j−1 (t).
Step 2.4. Calculate the difference between h j−1 (t) and Step 2.5. If h j (t) satisfies the conditions of IMF, then it is designated as c i (t) = h j (t). Otherwise, update the value of j: j = j + 1, and return to Step 2.2.
Step 3. Get the remaining signal: r i (t) = r i−1 (t) − c i (t), after decomposing the i-th IMF.
Step 4. When c i (t) or r i (t) satisfies the given termination condition, the cycle is ended.
Designate the final remaining signal as r n (t) (n = i). Otherwise, update the value of i: i = i + 1, and return to Step 2.
Finally, raw vibration signal can be decomposed into n IMFs: c i (t), i = 1, ..., n and one residue function r n (t): In this work, representative feature is fractal feature parameter extracting from each IMF. Because the method of fractal dimensions of time sequences needs k to be large enough, we use fractal feature parameter. And fractal feature parameter of each IMF will be calculated as Equation 3 shows. It is easy to know that the IMF's numbers of different raw vibration signal samples are different. And in the vibration signal examination, we find that the rich operating condition information is inside the front IMFs. So, we can integrate the residual IMFs into a component. In this new method, a parameter L is set to denote the number of IMF using to extract representative feature. And the L-th IMF will be re-denoted as c r (t) whose calculation form as Equation 4. Then, the feature set of each raw signal has L fractal feature parameters. For example, we set the value of parameter L as L = 6. Figure 2 summarizes all the IMFs and fractal features obtained from a bearing inner race fault signal sample. Table  1 presents the fractal feature parameters of IMFs of different operating condition vibration signal samples. And from Table 1, it is clear that fractal feature parameter sets of the same operating condition are similar, and it is easy to distinguish different operating conditions of fractal feature parameter.

Results and discussion
By using fractal feature parameter classification, bearing fault diagnosis method is applied to the bearing fault signal analysis from the Case Western Reserve University website [3].
The ball bearings are installed in a motor driven mechanical system, as shown in Figure 3. By a self-aligning coupling, a three-phase induction motor is connected to a dynamometer  and a torque sensor. The bearings are installed in a motor driven mechanical system. The dynamometer is under control so that desired torque load levels can be achieved. Vibration data is collected using accelerometer, which is attached to the housing with magnetic bases. Accelerometer is placed at the 12 o'clock position at the driven end of the motor housing. In machine condition monitoring, an accelerometer can provide rich information about conditions of several machine components. For example, the measured data from the accelerometer in this experiment is a mixture of signals reflecting conditions of the bearing inner race, outer race and rolling elements. The vibration data are collected by a 16 channel DAT recorder with 12,000 Hz.  Table 2.
Data In order to evaluate the classification performance of the fractal feature parameter of IMF, orthogonal quadratic discriminant function (OQDF-E) [9] is used to train and test on three data sets showed in Table 2. Table 3 gives the classification performance on various data sets.
The new bearing fault diagnosis method can get good decision accuracy as Table 3 shows. Table 4 extends the analysis of results and shows the classification performance between normal and fault operating condition. From

Decision-level fusion for bearing fault diagnosis
In above section, we have proposed a simple, fast and good performance fault diagnosis approach. This approach is based on single sensor source and using individual classifier.
It can obtain high accuracy on the multiple fault types recognition problems under the same fault degree. But when under multiple fault degrees, it declines in performance. To deal with this problem, this section will introduce a new method based on decision-level fusion for bearing fault diagnosis. The new fusion method includes four stages. These four stages are vibration signal acquisition and decomposition, fractal feature parameter extraction, single data source fault diagnosis and decision-level fusion for fault diagnosis. The first three stages are the same with the method described in the above section. So, we only state the last step in this section.

Methodology
Given a specific pattern recognition problem, different classifier has different classification performance. Very satisfactory results can not always be got if we simply conduct a study on a single classifier to improve its classification accuracy. Multiple classifier system (MCS) can overcome limitations of individual classifier and enhance classification accuracy. The techniques of combining the outputs of several classifiers have been applied to a wide range of real problems and it has been shown that MCSs outperform the traditional approach of using a single high-performance classifier [26].
The most often used classifiers combination approaches in MCS include the majority voting [30], the weighted combination (weighted averaging) [18], the probabilistic schemes [16,17], the Bayesian approach (naïve Bayes combination) [1,18,30], the Dempster-Shafer (D-S) theory of evidence [5,30] and etc. This section will propose a new classifiers combination method which treats the combination process as linear programming problem.
Assume that K base classifiers are used in MCS, and M kinds of fault states including normal condition on the bearing fault diagnosis problem. Then, a decision matrix can be given as follow in the process of multiple classifiers combination.
The new method introduced in this section will fuse those posterior probabilities in the decision matrix for constructing a global classifier E to make final decision. The posterior probability output of global classifier E for each fault state is calculated by following mode: where β k (∑ K k=1 β k = 1) is a dynamic association weight in MCS. This new decision-level fusion method for bearing false diagnosis is based on the assumption: the base classifier has higher real-time recognition accuracy, if its posterior probabilities of all fault states are greater difference. That is to say, if individual decision system very determines that current operating condition belongs to a certain type of fault states, the posterior probability of the certain fault state will much higher than others. Using this hypothesis, the problem of multiple classifiers combination can be converted into a linear programming problem. And the objective function of this linear programming is defined as: In current using classifier ensemble methods, base classifier's statistical performance is a major consideration factor. But we find the realtime decision information also can be a consideration factor. And in the new MCS method, we use within-class decision support [13] which is defined as: within-class decision support indicates that base classifier individual class recognition output gets the decision support degree from other same class recognition outputs in MCS. This decision support degree is measured by the difference between current output and its nearest output. For example, the within-class decision support of P k (F i |x) which denotes posterior probability of the i-th state from the k-th base classifier is: Real-time decision support value (DSV) of base classifier in MCS is the sum of all class recognition output's within-class decision support value. And it is easy to get its calculation formula as: In the proposed decision-level fusion method for bearing fault diagnosis, we set a rule: if the real-time decision support value of base classifier is higher, its dynamic association weight of it is bigger. And this rule can be described as follows: That is to say, the relationship between dynamic association weights is determined by the relationship between real-time decision support values of different base classifiers. And these relationships will be used in the linear programming problem by the form of relationship vectors. Relationship vectors are defined as Table 5 shows (for example, K = 3). From Table  5, it is clear that each real-time relationship between DSVs is re-expressed by one or two relationship vectors. And it is also clear that each relationship vector is K dimensions. All these relationship vectors compose a relationship matrix which is denoted as R.  2 , then further simplified to: ||D(x) T β − 1 M || 2 . Finally, use relationship matrix R to formulize constraint rules by the form: Rβ ≤ 0. Now, we can give complete linear programming problem description as Equation 11 shows, where N is the count of relationship vectors of current relationship matrix.

Real-time relationship between DSVs Relationship vectors
Solving the linear programming problem as above, we can obtain the dynamic association weight matrix β. Using this dynamic association weight matrix, the fusion decision vector of global classifier E can be calculated. And the final decision of bearing fault diagnosis can be got by:

Results and discussion
The decision-level fusion method for bearing fault diagnosis is also applied to the rotating machinery from the Case Western Reserve University website [3]. In this experiment, vibration signals are collected from accelerometers which attached to the motor at different positions as Figure 4 shows. And dynamometer is used to control the torque load level. In this work, we study four different operating conditions recognition under four different loads (0, 1, 2 and 3 hp) with fault diameters of 7 mils, 14 mils and 21 mils. And these four operating conditions are normal condition, outer race fault, inner race fault and ball fault. Two data sets are constructed as Table 6 presents for testing the diagnosis performance of new decision-level fusion method. Each data set samples cover four different operating conditions and four different loads. And each class of two data sets has 160 data samples which are divided into two equal halves, one for training and the other for testing. Data set A is a four-class classification task corresponding to the four operating conditions. Data set B is a ten-class classification task corresponding to various grades of different faults.  These data samples are extracted from two different sensor sources. And the number of samples from each sensor source is half of the total. If each sensor source's samples are seen as a subset of data set, each data set has two subsets. For example, data set A has two subsets: A1 and A2. A1 is composed by the samples from the driven end accelerometer, while A2 from the fan end accelerometer. Table 7 gives the elements description of data set A in detail. Sensor  The number of The number of  set label dataset  source  training sample testing sample  A  1  A1 driven end accelerometer  40  40  A2  fan end accelerometer  40  40  2  A1 driven end accelerometer  40  40  A2  fan end accelerometer  40  40  3  A1 driven end accelerometer  40  40  A2  fan end accelerometer  40  40  4 A1 driven end accelerometer 40 40 A2

Data Class Sub
fan end accelerometer 40 40 Table 7. The elements description of data set A In this work, two different classifiers, k-NN (k = 7) and Parzen classifier, are used for fault diagnosis task. And these two different classifiers identify rotating machinery operating condition using vibration signals collected from driven end and fan end accelerometers respectively. That is to say, each data set has four individual decision system results. And MCS is composed by these four base classifiers. Table 8 gives individual classifier recognition accuracy on subsets of data set A and B. It is clear that individual classifiers can attain high bearing diagnosis accuracy on data set A, but they can not maintain the same high-performance on data set B whose fault diagnosis task is extended to various grades of different fault conditions.  Table 9 shows the fault diagnosis performance of the novel decision-level fusion model using multiple classifier system. It is clear that the novel decision-level fusion model can get high recognition accuracy even in the difficult fault diagnosis task. In the testing phase of data set B, fault diagnosis accuracy of the new fusion model is higher than all base classifiers' accuracy as Table 8 shows. And it increases 6.5 percentage points averagely.
Data set Training accuracy Testing accuracy A 100% 100% B 100% 93.5% Table 9. Fault diagnosis performance using the new fusion model To further analyze performance of the new fusion model, a new k-NN classifier (k = 3) is added to multiple classifier system. The new multiple classifier system is used to test fault diagnosis performance on data set B. And sum rule is used to compare with the new approach. The comparison results are presented in Table 10. From

Feature-level fusion for bearing fault diagnosis
This section will propose a new multiple sources feature-level fusion model for bearing fault diagnosis using GEP. At present, the research of fault diagnosis based on feature-level fusion is still less, far from decision-level fusion attention. This is mainly because feature-level fusion is more difficult. But feature-level fusion application for fault diagnosis can be more effective to extract fault feature information. It is a way to improve the performance and robustness of bearing fault diagnosis system.

Methodology
GEP was invented by Ferreira [8], and it is the natural development of genetic algorithms and genetic programming. GEP uses linear chromosome which is composed of genes containing terminal and non-terminal symbols. Chromosomes can be modified by mutation, transposition, root-transposition, gene transposition, gene recombination, one-point and two-point recombination. GEP genes are composed of a head and a tail. The head contains function (non-terminal) and terminal symbols, while the tail contains only terminal symbols. For each problem, the head length (denoted h) is chosen by users, and then the head length is used to evaluate the tail length (denoted t) by: t = (n − 1) × h + 1, where n is the number of arguments of the function with most arguments.
The flow of GEP is as follows: Step 1. To set control parameters, select function classes, initialize population.
Step 3. To take use some operation such as selection, mutation, inserts sequence, recombine, mutation of random constant and inserts sequence of random constant to create new population.
Step 4. To implement best preservation strategy.
Step 5. If obtain most precision of computing, evolution would be finished, else turn to Step 2.
The new feature-level fusion model using GEP will be dealt with multiple sensors fusion problem. Assume that there are I sensors used in machine condition monitoring. For each sensor, the raw signal is divided into some signals by the same time segment. Each of these signals is processed to extract some features. In this chapter, machine operating signal features only take into account the time-domain statistical characteristics. These feature parameters of time-domain are presented in Eequations. (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23), where x(t) is a signal series and N is its number of data points.
In the pattern recognition process of bearing fault diagnosis, we assume that there are M conditions including normal condition. Let S i m represents the set of all training samples belonging to m-th condition (1 ≤ m ≤ M) from the i-th sensor source. Feature-level fusion model is seek a way to fuse these features from different sensor sources. The new feature-level fusion model using GEP fuses these features by looking for a feature recognition function ϕ which maps the feature space to another space where samples in the same class are similarity and samples dissimilarity otherwise. And then, the feature recognition function ϕ will direct the building of a multi-source feature fusion model in reverse direction.
Functions +, −, ×, /, sqrt, exp are selected as input functions of GEP. The generation is set 5000, and fitness function is defined as: where σ m is the mean of all m-th condition samples function mapping values, its formula is: After GEP training, a perfect feature recognition function ϕ can be got. Using function ϕ, we can calculate the mean mapping value of each operating condition samples from a certain sensor source. For building the multi-source feature evaluation matrix, the samples which are correctly classified are selected to calculate their mean. Multi-source feature evaluation matrix is composed by these mean values as Equation 26 shows. ⎡ In Equation 26, each element represents the mean value of each feature component of each operating condition. For example, ρ 2 (1) represents the mean of all correctly classified samples of the first feature from the 2-th operating condition.

Results and discussion
In order to evaluate the proposed feature-level fusion model, we apply it to bearing fault diagnosis. And data of this bearing fault diagnosis task are also take from a lab of the Case Western Reserve University website [3]. In this work, three experiments over three data sets are conducted as Table 11 shows. Those data are collected under various operating loads from motor driven end and fan end accelerometers.  Table 11. Description of three data sets Each data set covers four different operating conditions and four different loads (0, 1, 2 and 3 hp). And each class of data sets has 160 data samples which are divided into two equal halves, one for training and the other for testing. The task of data set A is to identify different type of faults, while the experiment over data set B is carried out to further investigate the diagnosis performance of developing faults when the fusion model is trained by incipient faulty samples. And the experiment over data set C is to test the diagnosis performance of incipient faults when the fusion model is trained by the serious faulty samples. Table 12 gives the results of these three experiments. From Table 12, we can see that the new feature-level fusion model using GEP can get stable, good diagnosis performance. And it is clear that testing performance is higher than training performance in the experiment on data set C. That is to say, when the new feature-level fusion model is trained by the serious faulty samples, it can easily identify incipient faults. In order to observe the performance change when the new feature-fusion model uses multiple source information instead of single source information, the new method is used to test bearing fault diagnosis performance with single sensor source. Table 13 gives the performance comparison result between more than one sensor (here using two sensors) and single sensor. From Table 13, we can see multi-sensor testing performance is greatly higher than the single sensor application using the new feature-level fusion model.

Data set
A B C Multi-sensor testing performance increasing 0.56 0.48 0.57 Table 13. Performance comparison between multi-sensor and single sensor

Conclusion
This chapter has introduced some new methods for bearing fault diagnosis. These new approaches are using information fusion and intelligent algorithms. Bearing fault diagnosis is still an ongoing research subject over a decade and attracting a huge number of researchers in different areas. But most of those current using techniques mainly deal with single-source data. Many researches have shown that an individual decision system with a single data source can only acquire a limited classification capability which may not be enough for a particular application. So, we study a new way for bearing fault diagnosis using information fusion technology and intelligent algorithm.
Information fusion is a field still under research. Generally, information fusion process may happen in three levels: sensor level, feature level and decision level. Here, we propose a new feature level fusion method and a new decision level fusion method for bearing fault diagnosis. The feature level fusion method is using GEP which is a new intelligent algorithm. And it is a parallel fusion method. The decision level fusion approach is based on a new multiple classifier ensemble method. It analyzes raw vibration signal, and completes the feature extraction by using EMD and fractal feature parameter calculation. From experimental results, we can see that these new fusion model for bearing fault diagnosis task can get good decision performance which is higher than the performance from traditional single sensor application.