Data-Driven Methodologies for Structural Damage Detection Based on Machine Learning Applications Data-Driven Methodologies for Structural Damage Detection Based on Machine Learning Applications

Structural health monitoring (SHM) is an important research area, which interest is the damage identification process. Different information about the state of the structure can be obtained in the process, among them, detection, localization and classification of damages are mainly studied in order to avoid unnecessary maintenance procedures in civilian and military structures in several applications. To carry out SHM in prac - tice, two different approaches are used, the first is based on modelling which requires to build a very detailed model of the structure, while the second is by means of data-driven approaches which use information collected from the structure under different struc tural states and perform an analysis by means of data analysis . For the latter, statisti - cal analysis and pattern recognition have demonstrated its effectiveness in the damage identification process because real information is obtained from the structure through sensors installed permanently to the observed object allowing a real-time monitoring. This chapter describes a damage detection and classification methodology, which makes use of a piezoelectric active system which works in several actuation phases and that is attached to the structure under evaluation, principal component analysis, and machine learning algorithms working as a pattern recognition methodology. In the chapter, the description of the developed approach and the results when it is tested in one aluminum plate are also included.


Introduction
Structural health monitoring (SHM) is a very interesting area, which main objective is the damage identification using permanently installed sensors to the structure. In general, one of the aims is to monitor in real time a structure in order to know the current state starting from the damage detection, from this point of view, damage detection is extremely important: first, for safety, because it helps manage the downside risk resulting in a reduction cost by improving the visual inspection and maintenance processes [1,2]. Currently, the new developments in several areas include the use of more complex structures. In many cases, the relation between the structure and the rest of the elements introduces interdependences which can be non-linear increasing the difficulty of the damage detection process. In these cases, a multicomponent and systemic approach can be incorporated to result in a safe and optimal maintenance model [3]. It is also important to note that there is infrastructure, which has been in use for several years, some examples can be found in historical buildings, bridges, aeronautical and aerospace structures, among others. This aging process brings new challenges [4] for SHM systems.
It is mandatory also to highlight the wide range of opportunities offered by the automation of the structural health monitoring process which can be used in conjunction with other automation systems such as an integrated transport system (ITS -Intelligent Transportation Systems), auto guided vehicles, among others. This symbiosis can offer benefits and give news perspectives about the use of the structures by providing additional information that the SHM systems can leverage to increase reliability, robustness and efficiency, reducing the probability of error, and providing tools for a better decision-making [5]. Structural health systems have a wide application in countless civilian infrastructures such as bridges [24] and buildings [6]. Similarly, SHM systems have been also applied to monitor mechanical components such as fuselages helicopters [7], wind turbines installed on land [8,9] and sea (offshore) [10], aerospace equipment [11], aircraft [12], high-speed trains [13], aircraft turbines [14] and boats [15], in the same way SHM systems have been applied to marine renewable energy equipment [16]. It is noteworthy that the environmental conditions need to be considered to ensure a robust damage detection, in this sense, some works have been introduced to compensate the effects of the temperature changes [17,18].
Regardless of the infrastructure design or the technology used in the development of the maintenance decision making, there are some factors to consider. Factors, such as information about the physical infrastructure, administrative information, use, and many others such as reliability, maintainability, operability, bearing capacity, and policy-adopted maintenance [19], need to be considered. Added to this it must be remembered drift probability [20]. The theories and the definition about the best inspection process are really complex, for instance in the machines which are working all time it is necessary to develop maintenance methodologies to avoid the failure or breakdown maintenance, in this sense, preventive maintenance and reliability-centered maintenance, among others need to be included [21]. This chapter includes a description of a methodology for damage detection and classification and the experimental validation with data from an aluminum plate instrumented with piezoelectric transducers permanently attached to its surface. In this sense, the chapter is organized as follows: Chapter 2 presents general concepts about the methods and concepts used in the methodology, Chapter 3 explains the methodology. Chapter 4 describes the experimental setup, after Chapter 5 presents the results, finally the conclusions are included.

General concepts
The methodology described in this work uses some well-known methods for data driven, however in this section some of this concepts will be introduced.

Principal components analysis
One of the greatest difficulties in data analysis occurs when the amount of data is very large and there is no apparent relationship between all the information or if it is very difficult to find. As solution, principal component analysis (PCA) was born as a very useful tool to reduce and analyze a big quantity of information. The principal component analysis technique was described by Pearson in 1901, as a Mechanism of Multivariate analysis and was also used by Hotelling in 1933 [22]. This method allows to find the principal components, which are a reduced version of the original dataset and include relevant information that identifies the reason for the variation between them. To find these variables, the analysis includes the transformation of the current coordinate space to a new space in order to re-express the original data trying to filter the noise and redundancies. These redundancies are measured by means of the correlation between the variables [23].
There are two mechanisms to implement the analysis of main components: first method is based on correlations and second is based on covariance. It is necessary to highlight that PCA is not invariant to scale, so the data under study must be normalized. Many methods can be used to do this as is shown in [23,24]. In many applications, PCA is used as a tool to reduce the dimensionality of the data to be applied in a subsequent process to work with a reduced number of data. Currently, there are many useful toolboxes to apply PCA and analyze the reduced data provided by the technique [25], this is one of the reasons about PCA still being used. More information about PCA and the normalization process can be consulted in Refs. [24,[26][27][28].

Machine learning
Since Alan Turing showed interest in learning by machines, this area has remained at the forefront of the research by increasing his popularity and expanding its field of performance [29]. This has revolutionized the way in which complex problems has been tackled. In the relentless pursuit of best tools for data analysis, machine learning has been highlighted by finding a set of strategies for pattern recognition, which are able to find the relationship between data that at first glance have no correlation and are very difficult to define a deterministic mathematical model. Machine learning strategies and bio-inspired algorithms allow to avoid this difficulty through mechanisms designed to find the answer by themselves. In SHM or related areas, it is possible to find some applications about how machine learning has been used to detect problems, such as breaks, corrosion, cracks, impact damage, delamination, disunity, breaking fibers (some pertinent to metals and the others to composite materials [30]), in addition it has been used to provide information about the future behavior of a structure under extreme events such as earthquakes [31].
Depending on how the algorithms work, machine learning can be classified into two main approaches: unsupervised and supervised learning. First, the information is grouped and interpreted only using the input data, however, the second, requires information about the output data to perform the learning task. Figure 1 shows this classification and includes information about the works that each one of these learning can be used.
Since this work is aimed to classify damages, supervised learning is used. In practice, this task is performed through the classification learner toolbox of MATLAB®, and Table 1 includes the methods used in the development of this work.

Damage classification methodology
The methodology used in this work is aimed to the damage detection and classification. To perform this task, it is necessary to highlight that pattern recognition point of view is used, in this sense, the methodology works first with the definition of a healthy pattern which is obtained from different states of the structure. In this work, data from healthy and different damages are used as inputs to the machines. This stage is defined as training and is developed as in Figure 2.
In general terms, the process includes a pre-processing step, where all the experiments are organized in a matrix per each actuation phase as in Figure 3, and normalization is applied before to create PCA models.   After training step, same experiments with unknown scenarios are applied to the structure, and these data are pre-processed and projected in the principal components and included in the trained machine to determine to which state it correspond. Figure 4 presents a description of the steps used on that process. Figure 5 shows a scheme of the SHM system, it is composed of one oscilloscope of four channels with an usb interface, one arbitrary generator, and a CPU as processing unit, additionally there is a switching device, which is implemented for automatizing the measurement as it is shown in Figure 5.

Experimental setup
The inspection process can be summarized in the following steps: • A burst signal is applied to one PZT and the rest of the transducers are used as sensors.
• A multiplexing system allows to change the actuator and collects the information from the rest of the sensors. This process is applied as many times as piezoelectric sensors are attached to the structure.
• A digitizer is finally used to capture the information collected by the sensors via an oscilloscope with usb interface.
The system collects the information in several files, in this case four since there are four transducers, and pre-processes, as was explained in the previous section. To validate the methodology, four structural states including the healthy state and three simulated damages were used as in Figure 6. These kinds of damages are used to produce changes in the wave propagation [27] and to provide different scenarios for validating the methodology.

Experimental results
In order to validate the methodology with several machine learning methods, three experiments were implemented. The objective is to determine the behavior of the different methods of machine learning described in Section 2 and its performance under different scenarios which are obtained by changes in the input data and the pre-processing step. In most of the cases, these kinds of changes are the responsible for producing false alarms in the damage identification process. In this way, the acquisition process was made by looking the effect of the attenuation with long cables (2.5 m) and short cables (0.5 m), the addition of Gaussian noise to the acquired signals and the use of a Golay filter in the pre-processing step. These experiments are explained below.
First experiment: acquisition performed with a short cable (0.5 m) from the digitizer to the sensors, and the acquired signals filtered with a Golay filter algorithm in this experiment after adding white Gaussian noise.
Second experiment: acquisition performed with long cable to sensors (2.5 m), and signals filtered with the Golay algorithm.
Third experiment: acquisition performed with a short cable (0.5 m) from the digitizer to the sensors, and the signal filter without a Golay filter algorithm.
As it was previously introduced, in the first group of experiments, the influence of added noise to the data will be explored in order to determine how it affects the results in the principal components. For this, the Golay filter is applied to reduce the influence of aleatory signals and after the white Gaussian noise is added to the signals. Later, the methodology was applied to the signals with and without noise to determine the influence of the white noise in the detection process. An example of the signals used by the algorithms in the actuation phase 2 can be seen in Figure 7, similar results are obtained with all the signals.

Figure 8a
shows the first two principal components of the signal for the actuation phase 1, which are after used to train the machines, this train was made with methods included in the classification learner toolbox of MATLAB® shows in Table 1. This behavior is the same in all the actuation phases.
As seen in Figure 8a and 8b, the first the principal components are able to eliminate the noise and prove that they are a good tool for defining the elements to include in the machine this is the experiment one.
After searching the principal components, the machines are trained with these data. Although all the machine learning methods were explored, following worst and best results are shown for a better understanding. Figure 9 shows the confusion matrix with test Coarse KNN machine, and the result in all cases was very poor, with most machines having this behavior. Figure 10 shows the confusion matrix with test Bagged Trees machine, the result in all cases was good, Fine KNN, Weighted KNN, Bagged Tree and subspace KNN, also the behavior was good, but only in some machines good response was obtained.    In general, the response of these machine learning algorithms was good with or without added noise because PCA has shown great ability to reject the noise.
The second case was considered when the acquisition system is connected with long cables, and Golay filter for pre-processing is used, in this case the signals in some cases were bad digitalized because of the impedance of cable, the noise, the low voltage of the stimulus, and other experimental features. An example of the captured signals is shown in Figure 11. Figure 12 shows the first two principal components obtained from the signal, which were used to train the machines.
As in the previous experiment, all the methods were explored and best and worst results are included in this work. Figure 13 shows the confusion matrix with Weighted KNN, and the behavior was similar to the first experiment. Similar results are obtained with adding Fine KNN, Weighted KNN, Bagged Tree, and subspace KNN.
Bad results were obtained with other methods for Coarse KNN. Figure 14 shows this behavior, which is similar to the experiment 1.
Similar results were obtained with the third experiment; in this case, a short cable was used and unfiltered signals were used to calculate the scores. Figure 15 shows the acquired signal in the actuation phase 1. Figure 16 shows the first two principal components of the signal, however in this experiment these data were not used to train the machines, this means, principal components are projected into the machines trained in the first experiment to determine the influence of these changes in the results. Figure 17 shows the response of the Coarse KNN machine, in this last case, the training is not success with this data series. Figure 18 shows the response of the Fine KNN machine, similar results to the previous case are obtained, this means, a bad classification is provided by the machine.

Conclusions and future work
The piezoelectric transducers working as an active inspection system provide a good system to produce mechanical waves over materials under evaluation. In all the cases, the information obtained from the healthy state and the different damage scenarios applied to the methodology showed that algorithm is available to detect real and simulated damages in both structures in spite of shapes and differences in the element under inspection.
For all the experiments, the results showed that the behavior was very similar, only few machines architecture presented good results, these are: Fine KNN, Weighted KNN, Bagged Tree, and subspace KNN. Others types of machines did not work well for the experiments.
In all cases, it is necessary to train the machines with data pre-processed in the same way as in the definition of the healthy state, changes in the elements such as the cable length and the use of the Golay filter are enough to change the results in the PCA model obtained which do that the machines do not work correctly.
PCA is a robust mechanism to characterize data since it was demonstrated to eliminate the noise, however, more experiments need to be considered by including environmental and operational noise to determine the effectiveness of the algorithm.