A Wavelet Threshold Function for Treatment of Partial Discharge Measurements

Based on the wavelet transform filtering theory, the chapter will describe the elaboration of a wavelet threshold function intended for the denoising of the partial discharge phenomenon measurements. This new function, conveniently named Fleming threshold, is based on the logistic function, which is well known for its utility in several important areas. In the development is shown some variations in the application of the Fleming function, in an attempt to identify the decomposition levels where the thresholding process must be more stringent and those where it can be more lenient, which increases its effectiveness in the removal of noisy coefficients. The proposed function and its variants demonstrate excellent results compared to other wavelet thresholding methods already described in the literature, including the famous Hard and Soft functions.


Introduction
The analysis of the Partial Discharges (PD) phenomenon, which manifests itself in the existing imperfections into the insulation of high voltage equipment have received global acceptance as an important tool for the predictive diagnosis of the operational conditions of these, allowing taking measures that can safeguard both the material and the power supply quality of the electrical system.
PD are short duration impulsive signals and, consequently, these can be detected in a wide frequency range, from a few kHz to GHz.Normally, there is a direct relationship between the frequency range where there is a higher incidence of PD pulses and the type of high-voltage equipment evaluated, e.g.transformers and generators usually emit pulses from a few tens of kHz up to about 30 MHz [1], whereas Gas Insulated Substations (GIS) are affected by very fast pulses ranging from 300 MHz to 3 GHz and for cables the spectrum covers frequencies from 300 MHz to 1 GHz.
Figure 1 shows two examples of measured PD pulses in two different HV equipment, a GIS and a hydro generator.Note the marked white noise presence.The pulses normally have an exponentially damped oscillatory shape or only an exponentially damped shape [2].
The proper diagnosis of equipment is closely related to the peak amplitude and shape of the pulses detected.Therefore, it is important to preserve the amplitude characteristics of the signal (especially the peaks), providing higher Signal to Noise Ratio (SNR) and lower Amplitude Error (EA).
The application of FFT and STFT filtering is not as effective in the treatment of non-stationary, transient, and stochastic signals as the PD [3], since these transforms do not allow a location in the time and frequency domain in the same way as the wavelet transform does [4,5] (with better resolution in frequency and worse resolution in time for the low frequency components of the signal; and worse resolution in frequency and better resolution in time for the high frequency components of the signal).Therefore, the performance of these methods becomes limited in comparison with the wavelet denoising, which presents a capacity of self-adaptation to the signal.
Partial discharges, almost entirely, are electrically detected and quantified, exposing them to the extensive noise interferences that may compromise the PD signals measurement, limiting the diagnosis accuracy.Different signal processing tools have been used to extract the PD signals from these noise sources; among them, it is possible to highlight the Wavelet Transform (WT).The filtering by wavelet processing is recommended in the extraction of PD signals immersed in Gaussian noise [1,6].
An efficient application of wavelet processing depends on the careful selection of the parameters that will concentrate the coefficients on the most suitable decomposition levels to minimize the PD signal information loss.Among these, we have the applied WT, the number of decomposition levels, the wavelet functions used in each of these levels, the method of estimating the threshold value of the obtained coefficients and the threshold function.
Based on the WT filtering theory, this chapter will be described the development of a wavelet threshold function aiming to improve the noise reduction in PD measurements.The logistic function serves as an inspiration to this new function [29], which is well known for its usefulness in numerous areas.Since it is customary to associate functions of this type with something that refers them to the name of their developers (e.g., [25]), it was designated as Fleming threshold function.
The denoising performance of the proposed threshold function was compared with the traditional Hard and Soft functions and with twelve other thresholding functions.For a fair analysis of the filtering results, were used 2064 simulated and measured PD pulses contaminated with uniform white noise, Gaussian white noise, and Amplitude Modulated (AM) noise.The results showed that our proposal is able to overcome, qualitatively, and quantitatively, all the confronted functions.

Wavelet domain detection
Noise degrades the accuracy and precision of analysis, in addition to reducing the detection limit of the instrument applied in the PD measurements.Often the WT is a tool designed to attenuate continuous random noise (white noise), because after the decomposition of a signal in the wavelet domain can be noted that the average density of the coefficients is inversely proportional to the dyadic scale 1=2 j (j indicates the level of decomposition), i.e., half of the number of extreme local coefficients do not spread from a 1=2 j scale to the next 1=2 jþ1 scale, distributing it uniformly across the scales.As the wavelet coefficients distribution pattern of the PD signal (which tends to have its energy concentrated in few decomposition levels) differs from the noise pattern, it becomes easy to identify and separate the PD signals from the noise [30][31][32].However, in wavelet denoising, the noise attenuation occurs not only to the white noise but also to the noise with frequency components that do not match the frequency components of the PD pulses.
Basically, the wavelet shrinkage denoising process involves three steps [13,31]: 1. Determine the WT decomposition tree (discrete WT, wavelet packet transform, stationary WT or dual-tree complex WT) to be applied, the number of decomposition levels J and the wavelet function that will be employed on each of the j levels (where j ¼ 1, 2, ⋯, J), and then perform the decomposition of the analyzed signal into its wavelet coefficients; 2. Calculate the threshold values using one of the threshold selection rules, which depend on statistical estimation of the noise level present in the signal.Apply the calculated value in a threshold function to thus reduce the coefficients of the noise figure and preserve the signal coefficients of interest, in our case the PD pulse; 3. Reconstruct the signal by applying the Inverse Wavelet Transform (corresponding to the decomposition tree selected in the first step) in the threshold coefficients, to obtain the filtered signal in the time domain.Figure 2 illustrates each of these steps involved in the wavelet denoising processing of a digitalized signal.
As several parameters are involved, they should be carefully selected according to the signal characteristics, in order to maximize their wavelet coefficients above the noise level.Thus, filtering performance is closely related to each of these parameters and some of these will have a greater influence on the quality of the result.The determination of these parameters shows to be an optimization challenge [33].In this chapter, we will focus our attention on the improvement of the threshold function applied in the second step.

Fleming threshold function
In most of the wavelet denoising literature, especially those focused in the treatment of PD signals, the choice of the threshold function normally falls between the Hard and the Soft functions.Moreover, it is well known that for PD pulse filtering the Hard function tends to preserve more of the signal information, providing a higher SNR and a lower Amplitude Error (AE).However, the Hard estimate has discontinuities, being not differentiable, which ends up causing instability problems and sensitivity to small changes in the data pseudo-Gibbs effect.The Soft function is weakly differentiable and produces a high attenuation of the coefficients and, therefore, the reduction of the amplitude in the resulting signal.
In an attempt to get around these problems, many alternatives are being proposed.The main idea is to generate a high-derivative order thresholding function, which contributes to its use in optimization algorithms that look for the optimal parameters to be applied in the thresholding of each signal [34].Therefore, the function becomes adaptable to the signal to be processed, improving the quality of the denoised signal.
When analyzing the threshold functions applied, whether in the area of PD, audio, ECG, or image processing, it is remarkable that those seek improvements by combining both, the preservation properties of the coefficients and magnitudes provided by the Hard function, as well as the differentiation and smoothness provided by the Soft function.In image processing, the smoothness property is interesting so that the resulting image shows more pleasant contours.In signal processing, such as audio, ECG, and PD pulse processing, it is important to achieve better preservation of signal magnitude (peak) and signal noise ratio.
For this reason, many authors have explored functions that correspond to an interpolation of the Soft and Hard alternatives.As an example, it is possible to mention functions such as: the Garrote described by Nasiri et al. in [19]; the Non Negative Garrote described in [12]; the Adaptive Shirinkage showed by Partha Ray in [35]; the Liu developed by Shan Liu in [16]; the Hui presented in [36]; Stein and Semi-Soft shown in [12]; and the functions described by Zhang et al. in [37,38].However, the majority of the functions cannot adapt to the different signals due to the fixed transition curve on the threshold value.In these functions, there is still a greater tendency to smooth the coefficients than to preserve them, not realizing that for PD signals it is appropriate for the function to be closer to the Hard them to the Soft threshold function, but still preserving some of the smoothness (differentiability) in the transition of the threshold value, which will allow an improvement in the EQM and CC.
Following this line of reasoning, we propose a new threshold function similar to the Hard but being differentiable for higher orders and being able to adjust to each signal.This proposal is based on the well-known logistic function, shown in Figure 2, widely used in artificial neural networks, demography, economics, probability, statistics, chemistry, etc. Eq. ( 1) represents the logistic function, where H is the maximum value of the curve, α controls the slope of this curve and x corresponds to the value of x at the midpoint of the sigmoid curve dictated by the numerator value (Figure 3).When the x value tends to þ∞ the curve approaches H and when it tends to À∞ it approaches to zero.
Eq. ( 1) enabled us to develop the threshold function for filtering signals in such a way that it circumvents the problems previously described.As the objective in the thresholding process is to preserve the coefficients above the threshold value, it is easy to see that the maximum value H will be the decomposed wavelet coefficient w j,k (which corresponds to the variable x).Thus, the function will maintain symmetry when we vary the inclination constant c of the curve.Finally, it is necessary to move the function along the abscissa axis so that the graph shown in Figure 2 leaves the ordinate axis and stays over the threshold value, this is done by subtracting the variable x from the value where we want to move the function (x 0 ), i.e., the value of the coefficients w j,k must be subtracted from the threshold value λ.By making these adaptations, we obtain the following threshold function: For a more efficient implementation, in which it is not necessary to worry about the fact that the coefficient w j,k is positive or negative, Eq. ( 2) can be rewritten using the signum (sign) function, which returns þ1 if the value is positive and À1if the value is negative.Thus, we have: With high c values, the curve inclination on the threshold point is such that it approaches the Hard function, but with a smoother (differentiable) transition.For low c values, the inclination of the function will act with less intensity on the coefficients below the threshold value and with greater intensity on the coefficients above this value, i.e., a large part of noisy coefficients may pass and there will be information losses on those coefficients that represent the signal of interest, in our case the PD pulse.With the appropriate choice of the c value for each processed signal, it is possible to obtain a significant improvement in the result of the PD passing a large number of noisy coefficients, so it is indicated that the value of the constant be greater than or equal to 5.
In a PD evaluation, most measurements provide signals with amplitude around mV. Thus, if the WT technique is applied to filter the signals, its decomposed coefficients will also be in the mV range and by using a threshold rule (in our case scaledep) the threshold value λ will be small and usually smaller than 1, mainly for coefficients that contain more noise than the PD components.When we evaluate the threshold function for a small threshold value (e.g., λ ¼ 0, 05), the accuracy with which the coefficients are attenuated becomes lower, as illustrated in Figure 5.Note that even for c ¼ 200, most of the noisy coefficients can pass, different than what was seen for the threshold value λ ¼ 1.
One solution to overcome this problem was to adapt Eq. ( 3) according to the threshold value when it is considered small (understand as small as λ < 0, 5), by simply changing the c constant that controls the inclination proportionally to the λ threshold values.With this, we can rewrite Eq. ( 3) as follows: 1 þ e c Àsign ω j,k ðÞ Âω j,k ðÞ þλ ðÞ if λ ≥ 0, 5 ðÞ Âω j,k ðÞ þλ ðÞ if λ < 0, 5 thus, when λ < 0, 5 the lower the λ threshold value, the greater the rigor in discarding the coefficients (closer to the Hard function), with a significant improvement in the function's behavior, as shown in Figure 6.
Therefore, in Eq. ( 4) we have a function capable of adapting to different types of wavelet coefficients, varying between the Soft and the Hard threshold functions according to the c inclination value defined.Thus, there is a need to define how (and which) the inclination value should be applied to the coefficients.

Relevant wavelet coefficient identification
From the idea of identifying the most important coefficients to form the PD signal, used for the SNRBWS method, we were able to perform a variant on the threshold function.In this case, we chose to use kurtosis (K u ) as a statistical measure of the probability distribution's flatness [39] of the ω j,k wavelet coefficient, because the tapered this curve, the farther from the Normal probability distribution (Gaussian), which is characteristic of the white noise presence.Therefore, kurtosis will serve as an indicator to know if we have noisy coefficients (kurtosis close to 3) or PD components (high kurtosis ≥3).
Figure 7 shows the detail coefficients at level j ¼ 1 and the detail coefficients at level j ¼ 6 with their respective histograms.Notice that in Figure 7(a), formed almost exclusively by noise components, the histogram is very close to the Normal probability distribution, a fact that is confirmed by the kurtosis value equal to 2.9687; in the Figure 7(b) the coefficients have significant information about the PD pulse and the histogram is more tapered (leptokurtic), moving away from the Normal distribution, as indicated by the kurtosis value of 9.9612.
Then, to fulfill the task of identifying the most relevant coefficients to form the PD signal, it is enough to assume the following the condition regarding the kurtosis value: if the kurtosis of the coefficient is greater than 4, it must be considered important and the threshold function will make use of a lower c inclination constant, allowing the passage of more coefficients, otherwise it will be considered as noisy coefficients and a much higher inclination constant must be assigned (in case c ¼ 10 20 ), eliminating a greater amount of noise, which approximates our function of the Hard.In equational terms, we have the Eq. ( 5): 1 þ e c Àsign ω j,k ðÞ Âω j,k ðÞ þλ ðÞ if λ ≥ 0, 5 ðÞ Âω j,k ðÞ þλ ðÞ if λ < 0, 5

Fleming threshold function
In order to perform the evaluation of the Fleming thresholding functions, we took 2064 signals and submitted to the wavelet denoising processes.Among these signals, we included real PD measurements from HV equipment and PD simulated with different levels of uniform white, Gaussian white and AM noise (created of the same way described in [40]).For each data, we compare the performance of our proposal against the classical Hard and Soft thresholding, along with 12 other thresholding functions mentioned in the Section 4.
In addition to thresholding, the wavelet shrinkage process also requires the choice of the decomposition tree, the mother wavelet, the decomposition levels number and the threshold value λ ðÞestimation method.As our goal is to evaluate only the thresholding functions performance, we change only these and keep fixed the other wavelet parameters necessary to the signal filtering.We chose to use the FWT structure, due to the ease of its implementation and because it is widely applied in the treatment of PD signals.We use the SNRBWS method to select the mother wavelet and the NWDLS method to find the decomposition levels number.In the threshold value estimative, we chose the scaledep method [3,40,41].
Since the Fleming function depends on an c inclination constant, which controls how the decomposed wavelet coefficients are eliminated or attenuated, we also compare the results for different values of this constant.
The comparisons were done using statistical parameters as Absolute Mean Error (AME), Mean Square Error (MSE), Root Mean Square Error (RMSE), Correlation Coefficient (CC), Normalized Correlation Coefficient (NCC), Energy Difference (EnD), Signal to Noise Ratio (SNR), Signal to Noise Ratio Difference (DSNR), Noise Level Reduction (NLR), kurtosis difference (∆k); and local similarity criteria that involve maximum Magnitude Error (MEmax), minimum Magnitude Error (MEmin), maximum Peak Time Variation (PTVmax), minimum Peak Time Variation (PTVmin) and Rise Time Variation (RTV).Some of these parameters are used to form a fitness function (J Apt ), composed by global similarity criteria (cs g ) and local similarity criteria (cs l ), that can determine the best filtering result.All these criteria were described in [42].

Investigating the better inclination value c
In a first analysis, it was investigated, through the J Apt fitness criterion, what is the best c inclination value to be used in each alternative of the Fleming thresholding.Table 1 evinced that c ¼ 5 produces the highest amount of best results per threshold function.In Table 2 both methods produce best results with a lower constant, in case c ¼ 10 to Fleming and c ¼ 5 to Fleming 2. In this way, it is possible to recommend not to use inclination values higher than 10.

Comparison between Fleming, Hard and Soft threshold functions
The main objective of building a dedicated threshold function is to make it able to produce results superior to those of conventional functions.As seen, the most applied functions in wavelet coefficient filtering are Hard and Soft, not only for PD signals, but also for image processing, audio signals, etc.
First, we show in Figure 8 the results of the comparison between the first proposed alternative using c ¼ 5 against Hard and Soft functions.According to the J Apt , we find that the proposed function achieves a higher percentage of better results than the Hard and Soft.As expected, due to its simplicity, the Soft thresholding is the fastest in runtime.
We then compare in Figure 9 the second alternative proposed with the Hard and Soft functions.Note that there is a significant improvement in the number of better results, achieving superior performance in the EMA and ∆k criteria, which did not occur with the first alternative of our function.
Therefore, is evidenced by the superiority of the proposed alternatives in relation to the amount of better results obtained compared to the usual Hard and Soft methodologies.The only drawback is that our second proposal needs a little more time to be processed, but it is a relatively low price to be paid to achieve better results in reducing noisy components of PD signals.Also, note that, compared with the Soft function, the Hard thresholding tends to provide a better preservation of the PD pulses amplitudes and of the SNR, which confirms the statements made in the literature [3,22].In Figure 10 is shown a signal consisting of 3 simulated PD pulses wrapped in white noise and in AM noise, which was created as performed in [3].In addition, note the filtering results for the Hard, Soft, Fleming and Fleming 2 thresholding.The Soft function tends to considerably attenuate the pulses peak amplitudes; the Hard function shows greater preservation of these amplitudes; the Fleming function allows the passage of a little more noise with negligible amplitudes, but it achieves better preservation of the amplitudes than the Hard and Soft, while the Fleming 2 function is able to solve Fleming's problem by identifying the coefficients of greater importance.In this way, the Fleming 2 method presents better amplitudes preservation than the other functions and still manages to eliminate the low amplitude noise seen with the use of the Fleming.The filtering improvement is also indicated by the fitness function, with higher value (J Apt = 16.4607) for the filtering result using the Fleming 2 thresholding.

Comparison between all threshold functions
With the results described in the previous subsection, we have a quantitative idea of the used method's capacity, but only with the average results is possible to have a real sense of the quality of each one.Taking advantage of the opportunity, we implement the various wavelet thresholding methods (mentioned in Section 4), including the: Adapt Shrink, Garrote, Hui, Liu, Non Negative Garrote (NNG), Semi Soft (SS), Stein, Zhang 1 (Z1), Zhang 2 (Z2), Zhang 3 (Z3), Zhang 4 (Z4), and Zhang 5 (Z5).The required variables for each of these alternatives were designated according to the specifications provided by the respective authors in the works that describe them.
Similarly to what was done in Table 1, we made a percentage evaluation of the amount of best results considering all threshold functions and the proposed Fleming functions (compared for a constant c = 5).From Tables 3 and 4, the bold values evidence that the Fleming threshold had a superior performance when compared to the other alternatives.In terms of fitness, the one with the highest amount of better filtering results was Fleming.The Stein function outperforms the others in execution time.The Soft function ends up losing space in practically all the evaluated criteria, confirming that it is not suitable to treat PD signals, due to the high attenuation generated in the wavelet coefficients processing.Also was evaluated the average results of the evaluation parameters, according to the Tables 5 and 6.Note that the fitness J Apt = 3,20 of the Fleming function (using c = 10) is the highest among all the others, being followed by the Garrote and the Hard functions.Thus, the proposed alternatives achieve the objective of    overcoming other methods, also providing a better qualitative result in the treatment of PD pulses.Figure 11 exemplifies the wavelet shrinkage process using each of the thresholding functions discussed above for a PD signal measured from a hydro generator.In this case, note that the functions we have created are superior in preserving the amplitudes of the signals and eliminating the present noise, especially the filtering using the Fleming 2 function, which obtained the highest level of J Apt compared to the other functions, followed by Fleming, Garrote and Hard functions.The Soft and the other thresholding alternatives end up causing deformations of the pulses waveforms and greater attenuation of these peak amplitudes.The Zhang 1 and Zhang 5 functions allow most of the noise to pass through the denoising process and the Zhang 4 function ends up eliminating the PD signal that we are interested in obtaining.

Conclusions
Was presented a new threshold function called Fleming, which combines the quality of a strongly differentiable function and a more flexible alternative, enabling its optimization to provide better results in the PD signals treatment, in order to preserve its important characteristics for the diagnosis of the HV equipment subjected to the partial discharge analysis.The proposal inspired by the wellknown logistic function [29], which depends on a parameter that controls the inclination of the curve in the threshold value (calculated a priori).Also was created a variant of this same function, using a simple idea, but little investigated in the literature: identifying the decomposed coefficients with the greatest contribution in the desired signal recovering [3].
With the results described in Section 5, in which hundreds of signals (measured and simulated) were evaluated, the ability of the Fleming function and its Fleming 2 variant to overcome the most common functions such as Hard and Soft, as well as twelve other alternatives presented in some publications [15,19,22,[35][36][37][38].The Fleming function can be applied with different inclination values, but for PD signals, the ideal is that these values are limited between 5 and 10 to provide the best results.
The Fleming 2 alternative showed the highest percentage of the best results and the Fleming alternative showed the highest average value in terms of amplitude.Thus, if the goal is to achieve a higher number of better results, the indicated is to threshold the wavelet coefficients using the Fleming 2 function, but if the idea is to achieve better average results, consider using the Fleming function.As for the average processing time, these functions are relatively fast when compared to the other evaluated functions, not falling far behind the classic ones Hard and Soft.
The application of the developed thresholding functions is extensible to other types of signals, such as acoustic emissions, electrocardiogram signals, image processing, among others.However, in each case it would be necessary to investigate the appropriate values of the inclination parameter c.

Figure 8 .
Figure 8.Comparison of the better denoising results obtained between the Hard, Soft and Fleming threshold functions.

Figure 10 .
Figure 10.Comparison of the better denoising results obtained between the Hard, Soft, Fleming and Fleming 2 threshold functions to a simulated PD signal.

Figure 9 . 11 A
Figure 9.Comparison of the better denoising results obtained between the Hard, Soft and Fleming 2 threshold functions.

Figure 11 .
Figure 11.Comparison of the better denoising results obtained between the all evaluated threshold functions to a measured PD signal from a hydro generator.

Table 1 .
Best results percentage (by J Apt ) comparison between Fleming functions to various inclination constants.A Wavelet Threshold Function for Treatment of Partial Discharge Measurements DOI: http://dx.doi.org/10.5772/intechopen.94115

Table 2 .
Mean value results (by J Apt ) comparison between Fleming functions to various inclination constants.

Table 3 .
Percentage of best results by evaluation parameters for all threshold functions.

Table 4 .
Percentage of best results by evaluation parameters for all threshold functions.

Table 5 .
Average results by evaluation parameters for all threshold functions.

Table 6 .
Average results by evaluation parameters for all threshold functions.