Open access peer-reviewed chapter

Perspective Chapter: Computation of Wind Turbine Power Generation, Anomaly Detection and Predictive Maintenance

Written By

Cristian Bosch and Ricardo Simon-Carbajo

Submitted: 03 November 2022 Reviewed: 23 December 2022 Published: 23 January 2023

DOI: 10.5772/intechopen.109698

From the Edited Volume

Computational Semantics

Edited by George Dekoulis and Jainath Yadav

Chapter metrics overview

80 Chapter Downloads

View Full Metrics

Abstract

Early power loss detection in wind turbines is a key for the wind energy industry to avoid elevated maintenance costs and reduce the uncertainty regarding generated power estimations. Location, especially of those wind farms isolated offshore, causes the strategy of scheduled-only maintenance inefficient and very costly, additionally presenting a typically long downtime after a breakdown. These problems point to the creation of predictive solutions to anticipate the maintenance procedure, preparing the necessary parts and avoiding the possibility of destructive failures. Predicting failures in structures of such complexity requires modeling their multiple components individually in addition to the whole system. For this purpose, physics-based and data-driven models are used, which have proven themselves in this context. Machine learning has proven to be a valuable resource for solving a variety of problems in this industry. Thus, we will propose data-driven Deep Learning methods to compute the Power output of wind turbines with respect to all the mechanical and electrical features by using two types of Deep Neural Networks: a simpler combination of linear layers and a Long-Short Term Memory Neural Network. Then, with the use of a one-dimensional Convolutional Neural Network we will predict the time to failure of the system.

Keywords

  • wind-turbine
  • predictive maintenance
  • computational semantics
  • deep learning
  • LSTM
  • time series
  • regression
  • classification
  • CNN

1. Introduction

As per WindEurope [1], wind energy represents the second biggest provider of energy in the European Union (EU), accounting for a 18.8% of the capacity, behind gas. Ireland, for instance, represents the 3.5% of the EU’s combined capacity, and wind energy covers a 28% of the country’s energy demand. In this specific circumstance, it is interesting to point that the maintenance cost of a wind turbine can go from a 16% to a 30% [2, 3] of the Levelized Cost Of Electricity (LCOE). Wind power technologies, on the rise despite being mature, would more greatly consolidate by adopting solid solutions to the maintenance problem. Subsequently, the predictive approach for wind turbines and farms has become a priority, introducing techniques aiming for downtime reduction and wind turbine lifespan extension.

While physics-based modeling systems exists, our purpose is to approach the problem through the application of Deep Learning (DL) algorithms on the data collected by the Condition Monitoring System (CMS) and the SCADA control system from several wind turbines. For this work, we have obtained data from an onshore Siemens SWT-2.3-101 wind turbine. The target is, based on historical data, to predict the anomalous behavior of the system and a fault with enough anticipation.

This chapter will explain the phases of data preparation, acknowledging there is a time series behavior and the application of Deep Learning (DL) solutions. Two Deep Neural Network architectures designed with the purpose of data-driven prediction of the SCADA measured “Power” output have been trained in a semi-supervised paradigm, assuming the training samples belonged to what could be considered the normal state of the wind turbine. We will address whether there is a significant improvement if the regression on the power output is done with a basic Artificial Neural Network (ANN) mainly built on linear layers or using a memory cell, as it is the case in Long-Short Term Memory ANNs (LSTMs). For the latter, our choice will be applying directly of Bidirectional LSTMs (BiLSTMs) as they are expected to offer a performance superior to that of the base architecture [4]. Then, the anomalies in the prediction of validation and test datasets will be found using several methods that assume that, ideally, the prediction deviations from real data will follow a normal distribution (which will be the driver of the hyperparameter optimization). These methods include the standard deviation, a Monte Carlo dropout and a few-shot dropout using less predictions and fitting a t-Student distribution on them, all three at 95% CL. The anomalies will be statistically analyzed afterwards aiming for testing hypothesis about their relationship to the Time-to-Fault, which is highly complex to generalize in the case of these datasets with errors of different origin and short periods between them (the maximum constant performance period being of 11 days). The statistical significance of the anomaly appearance will be a motivator for building a one-dimensional Convolutional Neural Network (1-dim CNN) to predict downtime with an appropriate time-to-fault. In this classification task, we will consider our “Prefault” label for a processed sample as another hyperparameter and draw critical conclusions of its possible values. For a previous reference on our work in this dataset, we studied data augmentation through optimal transport dataset aggregation in [5].

The book chapter has the following structure: The subsequent subsection displays the state of the art with regards to wind turbine fault forecasting and general anomaly detection using several Artificial Intelligence methods. After that, we explain the methodology of our approach to solving this problem, analyzing the data with regards to discriminating them semantically as a preparation for the computation of the power output and the detection of anomalous behavior. Then, we will explain the methodology for our data-driven regression of the power output with the use of two different deep neural networks and the possible the extraction of the anomalies in the predicted data. After showing statistical interest in these anomalies, we will proceed to show our strategy to classify data samples with a 1D-CNN and present how an interesting choice of metrics and hyperparameter space can help solve this problem. We finalize the book chapter by sharing the conclusions of our research.

1.1 State of the art

Wind power generators are composed of different rotating components that undergo an extensive overall performance during its lifetime. Condition Monitoring Systems (CMSs) are common within the modern industry and are a set of sensors that screen the state of the turbine’s one-of-a-kind components in real time. The topic has extensively been reviewed in [6], where the advantages of Fault Detection Systems are outlined, ranging from cost reduction to the improvement of the Capacity Factor, since the ability to forecast and anomaly can optimize the moment when the maintenance is applied, avoiding any stops at high energy output periods. A CMS can collect data from a wide range of sensors focused on vibration, component temperatures, oil levels and electricity voltages and currents. Fault forecasting can combine this strategy with monitoring processes affecting the wind turbine, such as: crack detection, strain, thermographic analysis, electrical conditions, signal and performance monitoring and acoustic analysis.

There are methods inside the literature which put their attention on modeling the activity of the wind turbine parts by means of their physical behavior [7] and enhance this model with CMSs data from the wind turbine to create an approach favored by this hybridization. However, the challenge of our research is to consider only the data aggregated from the CMS to model the behavior of the wind turbines and make the model useful to predict periods of downtime and determine if it must be in a general way or with specific information about the upcoming fault.

Concerning the employment of these data using ML algorithms, the problem of defect detection can be resolved with two different strategies: i) Modeling the normal performance and detecting anomalies as they arise and ii) Evaluating data from time spanning before faults to anticipate defective behavior. We will face the problem by one or a combination of both these approaches.

Regarding anomaly detection, there is an advantage in modeling the normal output periods of a wind turbine, which is the use of most of the data collected by the CMS, as the datasets present a big imbalance with normal regime being the most populated class. These are known as semi-supervised models, since faults and time ranges of data close to faults are removed purposely before training the algorithms. A representative solution of this strategy are autoencoders. Autoencoders are Deep Neural Networks (NNs) with a symmetrical architecture with encoder layers that arrive to a bottleneck that stores the encoded representation of the data. These data will be decoded afterwards, with an output that preserves the important features. Autoencoders have become a reference in early fault detection and have been proven capable of discriminating the parts originating the failure [8]. There are other possibilities for early anomaly detection present in the literature: We can find NN architectures trained only with normal regime data that predict the power output expected at the next time iteration which, once compared against the actual generated output of the turbine, can determine if its behavior is unexpected. The parts responsible for such faulty behavior can be traced through a Principal Component Analysis (PCA) analysis [9]. We will partially follow this technique, modeling the power output through our dataset and then deviating from that work in the way anomalies are studied. Classification methods that constrain normal behavior periods to be modeled only if far prior or far posterior from a fault have been proven to be a competent way of discriminating the SCADA delivered features worth of consideration [10].

Moving to putting the focus on the historical faults of a device, the range of techniques is diverse too. The literature contains classifiers relying on supervised training, which uses datasets with every sample labeled. One example that differs from the analysis of SCADA data turns to visually inspect the turbines with drones and then trains Convolutional Neural Networks (CNNs) to detect usual vortex generator damage indicators such as erosion or missing gear teeth [11]. These datasets, while based on image collection instead of SCADA, feature high data imbalance too, which requires compensation from software, providing complex architectures for the CNNs proposed for the task [12]. With respect to using turbine sensor data in fully supervised training scenarios, multiclass classification with Support Vector Machines (SVM) has been undertaken with simulated turbine data, showing success in discriminating faults according to their nature [13]. SVMs gained early popularity in predictive maintenance field, though the main models currently employed are decision trees and gradient boosting. This is shown in the benchmarking of Random Forest and XGBoost classifiers present in [14]. Signal analysis has been experimented on too [15], where interference in the currents of the Double-Fed Inductor Generator (DFIG), originated by the vibrations of the faulty gearbox are studied through autoencoders and NN classifiers for anomaly detection.

Focusing more on the work related to anomaly detection [9], we can also find several applications of ANNs for different time series problems. We will follow the strategy of dividing the dataset in four pieces [16], emphasizing that training must be done with what is considered normal regime data [17], which we will apply to both our ANN based on linear layers and an BiLSTM. These methods rely on the deviations from predictions to the real output to fall in a normal distribution, so as to have a reliable way of computing confidence intervals for the anomalies to be spotted [18, 19]. The confidence intervals associated to the prediction will be computed making use of the Monte Carlo dropout, activating the dropout layers in the evaluation time for the computation of a big number of predictions, a method use by Uber [20]. In order to prove the viability of less resource-consuming, a small number of dropout activated predictions will be computed and then fitted to a t-Student distribution [21]. We will try to predict anomalous behavior using the global standard deviation of predictions in the normal validation dataset too, for a more complete analysis.

Wind turbine data feature engineering can be a complicated task. The idea of using CNNs, famous for extracting the interesting features of images, in the field of fault prediction, has been explored in the literature too, as a feature information extraction tool to be combined with LSTMs [22] and as an independent predictor for software bugs [23]. Another model that is a combination of adaptive feature engineering and a CNN for fault forecasting can be found in [24]. Techniques aiming for automated feature engineering and modeling are a greatly explored topic in the field of data science, as they ease building models when domain expertise is not available. Among these, a couple of very relevant toolkits are AutoML [25, 26] and the H20.ai package [27, 28].

Advertisement

2. Methodology

2.1 Data

Our data originates from a Siemens SWT-2.3-101 turbine. Samples were collected every ten minutes and the dataset spans for a period of nearly four years. From the features included in the dataset, we choose those that refer to weather conditions (wind speed, temperature, etc.), have mechanical origin (gear bearing temperatures, blade angles or pressure, etc.) or electrical measurements (different voltages and currents) to train our models.

The dataset is cleaned and labeled according to the status flag associated to each sample and assumptions with regards to posterior status flags. The computation of the regression requires a semi-supervised approach: After we make an initial split in training, validation and test sets, being the test a fourth part of the original data and the validation part a fourth portion of the remainder, we purge the training data from what we cannot confirm as well-behaved samples, and then we make the same in the validation dataset, thus creating one with only well-behaved data and a full validation dataset. All these portions are chronologically ordered, since we are dealing with a time series. This could have implications when training the BiLSTM, as there would be cuts due to the preprocessing purges. We will later assess if this has caused relevant effects on the models trained.

Since we are using a real industrial dataset that has not been curated for research purposes, there is no labeling beyond the indication of a fault happening (status flag). The definition of the normal or good behavior we seek to isolate is not clear (normal is not used as belonging to a Gaussian distribution here). As a way to deal with it, we define normal data based on the number of days before a fault is registered. This number of days will be considered a hyperparameter and thus the labeling process is included as part of the optimization to avoid putting a human bias greater than using full days as a time reference, which is a flexible enough decision. Trimming the data this way ensures that the power regression is computed with samples that are not semantically different, which will help achieving a well-behaved prediction distribution. A scaler is fit in what is considered normal data and the transformation is applied on the whole dataset, as the normal data is confirmed not to contain any outliers on features that would provoke a loss of information after scaling with the use of extreme points. The posterior classification task will have a similar labeling strategy, where a logistic regression will be performed for classifying each sample as either “Normal” (or 0) or “Prefault” (or 1). These prefault periods will be determined by a hyperparameter as well, which will then be determined by the best model during optimization.

As we are aiming for a full data-driven regression, after the selection of the features, there will be no dimensionality reduction as we want to include every detail in the prediction of the power. This contrasts with theoretical approaches that would try to reproduce the power curve, which is considered a relationship mostly exclusive between “Wind Speed” and “Power”.

Moving to statistical features of the data, as it being normally distributed, it is a fact that the wind turbine works within a regime that makes the power curve a relation but not a function (Figure 1), since each value of the domain can correspond to several values of the codomain. This has, as we will expand later, significance when the intent is to have a regression where its errors are normally distributed. Our intent is that the deviations of the real data from the prediction (on the well-behaved regime used for training and validation of the regression) are normally distributed which, due to the power output being multivalued, is not trivial (see Figure 2).

Figure 1.

Power curve, relating the wind speed and the power generated by the turbine.

Figure 2.

Wind speed (left) and power (right) histograms.

Considering that the dataset only once shows a period of 11 days going without any downtime and that we will assume that, for the convenience of scheduling the maintenance, at least 24 hours of anticipation are needed, we will set the days of well-behaved regime hyperparameter between 2 and 6 days before a fault, so the optimization will decide the best possible outcome without slicing dramatically the amount of training samples.

The metrics of the regression will be measured in the well-behaved split of the validation dataset, and it will be the full validation dataset the one used to compute statistics referring to the appearance of anomalies in the prediction of the power output. An anomaly will be defined by real data escaping the prediction intervals computed by the regression and posterior techniques with statistical significance.

Once the anomalies in the regression of the power have been computed, this anomalous data will be included in the 1D-CNN as features too, simply as the deviations between registered and predicted power for each sample. As we aim to predict a failure with an anticipation that is suitable for performing early maintenance, this is a very complicated task, which we will try to compensate with feature engineering by adding rolling averages of the different original features to the data, in an agnostic manner. The time windows will be considered hyperparameters and these newly engineered features will be created anew for each hyperparameter optimization step, whereas the original features will be a constant during training of the CNN.

2.2 Deep learning models

Our first goal is to find deviations from a data-driven regression of the power prediction and the real power output feature registered by the SCADA monitoring system. The next step will be to study these deviations statistically and determine whether their appearance is significant as the time before a fault gets shorter. Finally, we will build a classification model of the data samples for pinpointing an impending downtime of the wind turbine with sufficient anticipation. For succeeding in all these tasks, we will make use of three different neural network architectures. The power regression will be performed through two different approaches, a deep neural network (NN) that is made by a succession of Linear and Dropout layers (with ReLu activation layers within) and by using a BiLSTM with dropout, to check if the inclusion of memory cell can improve the regression even if we are not interested in using previous power predictions (autoregression). Power will always be our dependent variable. In a sense, we are extending the “Power curve” concept of computing power from wind speed but with the corrections of the other features included. As we mentioned before, creating the training dataset for this regression implied cutting out many samples, which could go against the philosophy of using recurring neural networks. However, a good performance in the first simple NN attempt would make these cuts irrelevant.

The architecture of the simpler regression model is built using PyTorch [29] ModuleList class, which allows us to build the NN with generality, determining the hyperparameters set to define it by using Bayesian optimization with Weights & Biases [30]. The BiLSTM will be defined using the LSTM class from PyTorch, with dropout and bidirectional parameters set as true. These NNs will have the following hyperparameters:

Both architectures:

  • Dropout probability.

  • Hidden layer size.

ANN:

  • Number of fully connected layers.

BiLSTM:

  • Number of recurrent layers.

  • Tensor time window dimension size.

Other hyperparameters related to their training are:

  • Batch size.

  • Learning rate.

  • Separation between well-behaved data and faults (in days).

As previously shown, Power is not Gaussian distributed. This may affect our predictions as their deviations from the real data may not be normally distributed either. However, we need them to be if we want to spot anomalies in the prediction. Thus, we will establish a custom metric that ensures this requisite is met, with the following definition:

metric=(r2)100×Validation lossE1

where r2 is the coefficient of determination in a Q-Q plot representing the deviations on the prediction with respect to the real “Power” feature in the well-behaved validation dataset against the 45° line. The data included in this Q-Q plot is extracted by modifying the StatsModels library [31] so as to retrieve the slope and ordinate at the origin of the “s” line when building the plot. By requesting the maximization of this custom metric, since 0 < r2 < 1, we are ensuring that the prediction errors in the normal dataset follow a Gaussian distribution and, at the same time, NNs that present a high validation loss are penalized, with the loss being the Mean Square Error.

After the optimization of these NNs, a prediction interval based on the standard deviation of these prediction errors will be computed for both the full validation and test sets. Then, a t-Student distribution prediction interval based on 10 runs with the Dropout layers in training mode (at evaluation time) and a full Monte Carlo prediction interval with 100 runs of Dropout in training mode will be added. These last two intervals have the advantage of changing sample by sample, instead of being completely general as in the case of the standard deviation. We will define all three prediction intervals at a 95% confidence level, as we want to prove there is statistical significance in when these anomalies appear with respect to a fault.

ANOVA tests will be performed on both the full validation and test datasets, with the hypothesis of anomalies having different distributions according to the Remaining Useful Life (RUL) or time-to-fault with respect to the total anomalies recorded for a time series slice between two faults and the total samples contained in said slice.

Regarding the classification task, our choice for performing logistic regression on the data as normal or prefault has been a Tensorflow [32] based CNN architecture. Since we are aware of the challenge inherent to fault forecasting, we decided to directly implement an architecture that gets higher features but, otherwise, it is quite simple: we will have four one-dimensional convolutional layers with a max pooling layer after each two of them. We will train minimizing validation loss (binary cross-entropy), despite it not being the focus of interest in our classification though.

We will evaluate a list of metrics. The most usual in classification tasks: precision, recall and f1-score, will of course be evaluated, according to:

Precision=TPTP+FPRecall=TPTP+FNF1score=2Precision×RecallPrecision+RecallE2

where TP, FP and FN are “True Positives”, “False Positives” and “False negatives” correspondingly.

However, our hyperparameter optimization goal will be to maximize the Matthews correlation coefficient (MCC), since it is more complete by taking into consideration “True Negative” (TN) predictions, as shown in Eq. 3. This is a very appropriate metric four our problem, as we have no previous knowledge of the correct labeling, and we are facing a dataset that can become greatly imbalanced towards the well-behaved or normal label of the turbine.

MCC=TP×TNFP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)E3
Advertisement

3. Results

3.1 Regression with a deep NN

After 100 runs of a Sweep (term used in Weights & Biases for a search in the hyperparameter space) with Bayesian optimization, the value obtained for the metric shown in Eq. 1 is metric = 0.98385. This metric has the double goal of favoring models that are statistically appropriate for anomaly extraction and fit correctly the data. Thus, a metric this close to 1 proves that a NN without a memory cell can fit the multivariate Power curve excellently. In Figure 3, we present the Q-Q plot showing how the deviations of our regression from the real “power” values belong to a Gaussian distribution (computed with the well-behaved data of the validation dataset).

Figure 3.

Q-Q plot of the deviations between prediction and real data in the well-behaved wind turbine part of the validation dataset.

Since the model has shown a good fit between Gaussianity and prediction deviations, the next step is to study the whole validation dataset, which includes data close in time to faults, as we want to prove that these data present anomalies. Our statistical study will study the deviations from data and prediction in the following ways: using a Monte Carlo dropout with 100 iterations and establishing a threshold of 1.96 times the standard deviation (95%) of the predictions for each sample; then computing a smaller Monte Carlo dropout sample of only 10 iterations and obtaining the t-Student distribution at a 95% confidence level and using 1.96 times the standard deviation of the global error-in-prediction distribution. These three strategies are presented as they represent and increasing speed of computation. A Monte Carlo dropout consists of setting the dropout layers of our NN architecture in train mode and perform a number of predictions for every sample, which can be quite slow. Therefore, it is interesting to find a lighter process to find an anomaly, such as computing the confidence interval defined by the mentioned t-Student distribution (Eq. 4) with a very limited Monte Carlo sampling.

CIt(0.95)x¯=x¯prediction±tα2,n1×s1+1nE4

where s is the standard deviation of the n samples used (10 in our case) and t is the pth percentile of the t-Student distribution with n-1 degrees of freedom.

Any prediction exceeding the high or low limits established by these methods will be considered an anomaly according to those particular statistics. Once these anomalies have been pinpointed, our interest moves towards determining if these anomalies arise significantly as the time-to-fault reduces. We present the results of these ANOVA tests in Table 1 with their statistical significance.

Anomaly MethodOne-way ANOVA resultp-value
Monte Carlo (100 iter.)Positive0.143
Monte Carlo (10 iter. t-Stud.)Positive0.143
Standard deviationPositive0.14

Table 1.

ANOVA results the hypothesis of anomalies appearing when time is closer to the fault.

As we can see, the deep NN is successful enough both for performing a regression of the power output accurately and to find statistically significant anomalies. Let us compare now these results with those obtained by training the BiLSTM for regression. This comparison is relevant as both NNs differ greatly in the training and prediction times, being the BiLSTM much slower and with greater need of resources. First of all, we present in Figure 4 the Q-Q plot of the best model obtained after the Bayesian optimization of the hyperparameters. This model has a metric (defined by Eq. 1) value of metric = 0.9732, which is still a good result for our purposes though worse than the previously obtained, with the drawback of going through a much slower training. As seen in the figure, the Q-Q plot does not fit that well the line of gaussianity in the deviation between prediction and real value.

Figure 4.

Q-Q plot of the deviations between prediction and real data in the well-behaved wind turbine part of the validation dataset for the BiLSTM.

We reproduce the one-way ANOVE tests the same way as with the previous architecture, presenting the results in Table 2.

Anomaly MethodOne-way ANOVA resultp-value
Monte Carlo (100 iter.)Positive0.018
Monte Carlo (10 iter. t-Stud.)Positive0.018
Standard deviationPositive0.018

Table 2.

ANOVA results the hypothesis of anomalies appearing when time is closer to the fault for the BiLSTM architecture.

This time, results are not favorable to the null hypothesis. The metric and the Q-Q plot were not as good as with the previous architecture, which may make the anomalies less reliable than those predicted by the simpler ANN, since the deviations in the validation split that only contains good behavior of the turbine are not a good fit into a Gaussian distribution, which is required to obtain reliable anomalies. It is also relevant to remind the computational semantics of data preprocessing at this stage, as the well-behaved data isolation for the training split causes cuts that can affect the memory cells of the architecture. Along this research, LSTMs have proven difficult to manage in terms of reproducibility as determinism is difficult to achieve and we included dropout to have the chance of doing the Monte Carlo sampling of predictions.

Nevertheless, these results are motivating if contradictory, which suggests the need for a way that arbitrates if it is possible to predict a fault and arranging maintenance with enough anticipation. This is where the 1-dim CNN enters. To recap the previous discussion, this is a challenging dataset where feature engineering is not an easy task and “Power” is dominated by the “Wind Speed”. The use of convolution takes care of part of the feature engineering and the rolling averages with time windows defined by hyperparameters (a different one for each of the original features) ensure a non-biased feature extraction. This agnosticism is also represented in the CNN architecture, where kernel sizes and filters are defined as hyperparameters too. This search of the hyperparameter space has a dimensionality too high for Bayesian optimization to work, so we will perform an extensive random search.

Since one of our purposes is to prove the convenience of the MCC metric in classification tasks, our figures will present MCC and f1-score, showing that the latter can be in a range that is considered good despite the former being too far from 1. The Matthews correlation coefficient can range from −1 < MCC < 1 and close to zero is a bad fit, though it is considered a good metric when MCC > 0.5 (negative numbers mean that there is anticorrelation). We show the values of MCC and f1-score in Figure 5 according to the time-to-fault, which we will plot as the number of samples labeled as 1 (we are performing logistic regression) from the fault backwards in time, which is the anticipation we seek for maintenance.

Figure 5.

MCC (left) and F1-score (color) with respect to time-to-fault (as samples labeled 1).

There are interesting results in this plot. There is a curve where most of the models fall and show a steady increase in both metrics. This increase is explained as more data samples are labeled as 1 or “Prefault”, which biases the model making it seemingly more accurate. From the different outliers to this curve, one is very interesting, as it greatly exceeds the curve at an interesting time-to-fault. The metrics for this particular hyperparameters are:

MCC = 0.6418

F1-Score = 0.8701

Time-to-fault = 197 samples (~1 day 9 h).

It must be said that at this time-to-fault the labeling of the whole dataset is very balanced, being the “prefault” (1) samples around a 48% of the data. As it has been a very specific result without nearby hyperparameter space realizations with similar metrics, we then fix the time-to-fault and scan the remainder of hyperparameter space to determine if there are more models converging into the metrics found. The results, shown in Figure 6, prove that after 400 random hyperparameter runs, a good f1-score is commonly achieved but it is indeed complicated to reach a good MCC metric (worse metrics than shown are cut from the figure).

Figure 6.

MCC with respect to the F1-score for a fixed time-to-fault of 197 samples (colors represent validation loss).

Thus, it is proven that, despite the difficulty of this endeavor, it is possible to train a 1-dimensional CNN to reliable predict wind turbine faults causing downtime and schedule maintenance with at least more than a day of anticipation. In Figure 5 we can see other time-to-fault values that are promising too, though it is understandably more complicated to predict an error the further back we move on time. For this purpose, it is highly recommendable to use the Matthews correlation coefficient instead of relying solely in the f1-score, as it is a more complete metric including True Negative samples in its computation.

Advertisement

4. Conclusions

Wind turbine datasets entail a high complexity for succeeding in the task of predictive maintenance. Through this chapter, we have proven that an artificial neural network can be built so as to train a regressor for the power output of the wind turbine according to the other features, where the main influence is the wind speed traditionally, as in theoretical power curves. The smart definition of metrics is the best ally to obtain a model that fits the target variable with high accuracy and can be used for computing anomalies, which requires deviations in the prediction of “Power” to fall in a Gaussian distribution for the wind turbine regime considered normal or without any fault in the time vicinity. Besides, we have shown that training a LSTM increases the difficulty of achieving these goals, which requires more computing time and resources to achieve a subpar result compared to that of the simpler NN architecture.

In addition to the regression of the power, we have developed a one-dimensional CNN architecture capable of, after an extensive hyperparameter optimization, classify any new registered data sample as a normal state or indicative of an impending fault that will cause downtime, with at least 1 day and 9 hours of anticipation. This was our main purpose, and for its achievement it has been necessary to solve both feature engineering through convolution and, as the original data is not labeled (only faults are annotated once they happen), finding the correct annotation of samples through the optimization with a powerful metric that is robust against class imbalance, such as the Matthews correlation coefficient.

To sum up, the problem of predictive maintenance without the aid of domain expertise or annotated training data can be solved with a patient hyperparameter optimization and the evaluation of strategic metrics powerful enough to train our neural networks correctly for the task.

Advertisement

Acknowledgments

The authors thank Enterprise Ireland and the European Union’s Horizon 2020 research and innovation programme for funding under the Marie Skłodowska-Curie grant agreement No. 713654.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. WindEurope Business Intelligence; Wind energy in Europe in 2018. Trends and Statistics. windeurope.org. Feb 2019
  2. 2. Taylor M, Ralon P, Al-Zoghoul S, Jochum M, Gielen D. Renewable Power Generation Costs in 2021. IRENA; 2022
  3. 3. Feng Y, Tavner PJ, Long H. Early experiences with UK round 1 offshore wind farms. Proceedings of the Institution of Civil Engineers - Energy. 2010;163:167-181
  4. 4. Siami-Namini S, Tavakoli N, Namin AS. The performance of LSTM and BiLSTM in forecasting time series. IEEE International Conference on Big Data (Big Data). 2019;2019:3285-3292
  5. 5. Bosch C, Simon-Carbajo R. Machine learning for wind turbine fault prediction through the combination of datasets from same type turbines, floating offshore energy devices: GREENER. Materials Research Proceedings. 2022;20:45-57
  6. 6. Hameed Z, Hong YS, Cho YM, Ahn SH, Song CK. Condition monitoring and fault detection of wind turbines and related algorithms: A review. Renewable and Sustainable Energy Reviews. 2009;13:1-39
  7. 7. Breteler D, Kaidis C, Tinga T, Loendersloot R. Physics based methodology for wind turbine failure detection, diagnostics & prognostics. EWEA. 2015;2015:1-9
  8. 8. Zhao H, Liu H, Hu W, Yan X. Anomaly detection and fault analysis of wind turbine components based on deep learning network. Renewable Energy. 2018;127:825-834
  9. 9. Mazidi P, Bertling-Tjernberg L, Sanz-Bobi MA. Performance analysis and anomaly detection in wind turbines based on neural networks and principal component analysis. In: 12th Workshop on Industrial Systems and Energy Technologies (JOSITE2017). Madrid: comillas.edu; 2017
  10. 10. Felgueira T, Rodrigues S, Perone CS, Castro R. The impact of feature causality on Normal behaviour models for SCADA-based wind turbine fault detection. ICML 2019 Workshop - Climate Change, How Can AI Help; arXiv:1906.12329; 2019
  11. 11. Shihavuddin ASM, Chen X, Fedorov V, Christensen AN, Riis NAB, Branner K, et al. Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies. 2019;12:676
  12. 12. Anantrasirichai N, Bull D. DefectNET: Multi-class fault detection on highly-imbalanced datasets. IEEE International Conference on Image Processing (ICIP). 2019;2019:2481-2485
  13. 13. Mokhtari A, Belkheiri M. Fault diagnosis of a wind turbine benchmark via statistical and support vector machine. International Journal of Engineering Research in Africa. 2018;37:29-42
  14. 14. Zhang DH, Qian LY, Mao BJ, Huang C, Huang B, Si YL. A data-driven Design for Fault Detection of wind turbines using random forests and XGboost. IEEE Access. 2018;6:21020-21031
  15. 15. Cheng FZ, Wang J, Qu LY, Qiao W. Rotor current-based fault diagnosis for DFIG wind turbine drivetrain gearboxes using frequency analysis and a deep classifier. IEEE Industry Applications Society Annual Meeting. 2017;54(2017):1062-1071
  16. 16. Chauhan S, Vig L. Anomaly detection in ECG time signals via deep long short-term memory networks. IEEE International Conference on Data Science and Advanced Analytics (DSAA). 2015;2015:1-7
  17. 17. Maya S, Ueno K, Nishikawa T. dLSTM: A new approach for anomaly detection using deep learning with delayed prediction. International Journal of Data Science and Analytics. 2019;8:137-164
  18. 18. Bakhtawar Shah M. Anomaly Detection in Electricity Demand Time Series Data. Sweden: KTH Royal Institute of Technology; 2019
  19. 19. Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for anomaly detection in time series. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges: ESANN; 2015. pp. 89-94
  20. 20. Zhu L, Laptev N. Deep and confident prediction for time series at Uber. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW). New Orleans, LA, USA: IEEE; 2017. pp.103-110
  21. 21. Hill DJ, Minsker BS. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach. Environmental Modelling and Software. 2010;25:1014-1022
  22. 22. Zheng L, Xue W, Chen F, Guo P, Chen J, Chen B, et al. A fault prediction of equipment based on CNN-LSTM network. IEEE International Conference on Energy Internet (ICEI). 2019;2019:537-541
  23. 23. Al Qasem O, Akour M. Software fault prediction using deep learning algorithms. International Journal of Open Source Software and Processes. 2019;10:1-19
  24. 24. He J, Wang J, Dai L, Zhang J, Bao J. An adaptive interval forecast CNN model for fault detection method. In: 15th IEEE International Conference on Automation Science and Engineering, (CASE). IEEE; 2019. pp. 602-607
  25. 25. Guyon I, Bennett K, Cawley G, Escalante HJ, Escalera S, Tin Kam H, et al. Design of the 2015 ChaLearn AutoML challenge. In: 2015 International Joint Conference on Neural Networks (IJCNN). Killarney, Ireland: IJCNN; 2015. pp. 1-8. DOI: 10.1109/IJCNN.2015.7280767
  26. 26. Guyon I, Sun-Hosoya L, Boullé M, Escalante HJ, Escalera S, Liu Z, et al. Analysis of the AutoML challenge series 2015-2018. In: Frank Hutter LK, Vanschoren J, editors. Automated Machine Learning. Cham: Springer; 2019. pp. 177-219
  27. 27. Stetsenko P. Machine Learning with Python and H2O; docs.H2O.ai; 2020.Available from: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/general.html#i-am-writing-an-academic-research-paper-and-i-would-like-to-cite-h2o-in-my-bibliography-how-should-i-do-that
  28. 28. Stetsenko P. Machine Learning with Python and H2O. 2020. Available from: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/general.html#i-am-writing-an-academic-research-paper-and-i-would-like-to-cite-h2o-in-my-bibliography-how-should-i-do-that)
  29. 29. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, Garnett R, editors. Advances in Neural Information Processing Systems 32. Vancouver, Canada: Curran Associates Inc; 2019. pp. 8024-8035
  30. 30. Biewald L. “Experiment Tracking with Weights and Biases,” Weights & Biases. 2022. [Online]. Available from: wandb.com
  31. 31. Seabold S, Perktold J. Statsmodels: Econometric and Statistical Modeling with Python. In: Proceedings of the 9th Python in Science Conference; SciPy; 2010
  32. 32. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. 2016. ArXiv, abs/1603.04467

Written By

Cristian Bosch and Ricardo Simon-Carbajo

Submitted: 03 November 2022 Reviewed: 23 December 2022 Published: 23 January 2023