Snapshot of data used to predict network impedance. Only data from frequency ranges 10–10.07 kHz and 499.93–499.97 kHz are shown.

## Abstract

In this chapter, the random forest-based ensemble regression method is used for the prediction of powerline impedance at the powerline communication (PLC) narrowband frequency range. It is discovered that while PLC load transfer function, phase, and frequency are crucial to powerline impedance estimation, the problem of data multicollinearity can adversely impact accurate prediction and lead to excessive mean square error (MSE). High MSE is obtained when multiple transfer functions corresponding to different PLC load transfer functions are used for random forest ensemble regression. Low MSE indicating more accurate impedance prediction is obtained when PLC load transfer function data is selectively used. Using data corresponding to 200, 400, 600, 800, and 1000 W PLC load transfer functions together led to poor impedance prediction, while using lesser amount of carefully selected data led to better impedance prediction. These results show that artificial intelligence (AI) methods such as random forest ensemble regression and deterministic data-optimization approach can be utilized for smart grid (SG) health monitoring applications using PLC-based sensors. Machine learning can also be applied to the design of better powerline communication signal transceivers and equalizers.

### Keywords

- random forest
- regression
- impedance
- data quality
- prediction
- ensemble
- machine learning
- smart grid
- deterministic artificial intelligence

## 1. Introduction

The utilization of powerline communication (PLC) as a tool for actualizing a smart grid (SG) has grown beyond its traditional uses for two-way SG communication, advanced metering infrastructure (AMI) applications, demand response, and power system control. As shown in Figure 1, cameras that can transmit data using PLC are now being used for monitoring of power system assets installed in remote locations [1]. PLC is also being extensively used for broadband Internet applications [2], consumer home automation applications [3], facilitating grid-wide artificial intelligence (AI) applications [4, 5, 6, 7, 8, 9, 10], monitoring grid health using PLC modems as sensors [11], etc.

Despite these uses, there are still numerous challenges militating against a more effective deployment of PLC for SG applications. The powerline being primarily designed for power transmission is a harsh environment for communication. As such, there exist the problems of varying impedance, numerous white and different nonwhite noise types, and excessive frequency-dependent attenuation [12, 13, 14, 15]. To ameliorate these problems and make PLC more useful for the SG, different parts of the powerline used for PLC as shown in Figure 2 can be optimized from the transmission (TX) end to the receiving (RX) end [16].

For PLC to be particularly useful for grid health monitoring, many researchers worldwide have focused on the problem of powerline impedance estimation. Powerline impedance is a very important parameter in the design of PLC transceivers and in installing a modem grid architecture [17]. In PLC, to achieve maximum power transfer between the PLC transmitting and receiving ends, powerline TX (Figure 2), transmission line, and RX impedance must always be known by the impedance matching network [17]. PLC impedance however is time varying since electrical loads are always being connected to and disconnected from the PLC networks, thus leading to the problem of PLC network impedance mismatches [18]. Accurate and real-time impedance information can be used to match impedance variables in PLC couplers to decrease PLC data attenuation [19]. Also, online and real-time knowledge of PLC network impedance is essential to overall grid health monitoring of the SG. In addition, real-time PLC-based impedance information can be useful for event detection [20], thus leading to lesser needs for having to install very expensive phasor measurement units (PMUs).

To improve on available methods of powerline impedance estimation, in [13], using an algorithm, the authors designed an adaptive inductor-capacitor-resistor-capacitor (LCRC) impedance matching circuit for improving the impedance matching problem in the PLC narrowband frequency region. Also, the authors in [17] measured impedance and attenuation of the PLC at the CENELEC bands in rural, urban, and industrial use cases, respectively. Results of the research are the production of a set of formulas that can be used to deduce impedances of the PLC in view of load variations on PLC networks. In [18], the authors produce a statistical model of PLC network impedance. Results of work discussed in [19] are a novel real-time impedance estimation method based on channel frequency response and machine learning variational mode decomposition (VMD) method. In [20], the authors propose a method by which powerline impedance can be estimated using device status detection algorithm and device individual energy and impedance signatures. In [21], the authors presented results of work in which the real-time estimation of powerline impedance is based on modal analysis theory, while the authors in [22] conducted a study on the design of a front-end optimal receiver impedance that maximizes signal-to-noise ratio (SNR) in broadband PLC.

One significant drawback of majority of existing PLC impedance estimation methods however is that they need dedicated equipment and the knowledge of the network topology. This is the problem that our approach in present work seeks to solve. We present a deterministic machine learning-based PLC impedance estimation method by which common PLC network data such as the PLC channel load transfer function, frequency, and phase can be used to estimate and predict PLC network impedance. A benefit of the deterministic AI impedance prediction approach adopted in our work is that data needed for impedance estimation and prediction is not excessively superfluous and such data can be easily stored in low-memory powerline network devices. In Section 2, we present a new set of results that shows the transfer function and attenuation profile of PLC in the narrowband frequency bands based on electrical loads connected to the powerline. In Section 3, we briefly discussed the random forest ensemble method, and we present results of how we used PLC network data to optimize results of PLC impedance prediction in the narrowband frequency bands. Section 4 details results and discussion and Section 5 presents conclusion of the chapter.

## 2. New results on powerline communication attenuation profile in the PLC narrowband frequency bands based on loads on the powerline

In literature, there exist several PLC models that are used to evaluate the behavior and performance of PLC networks. In Philip’s echo model, the PLC transfer function is given in [14] as

In Eq. (1), * N*is the number of possible signal propagation paths, with each path in N delayed by time factor

Similar to Philip’s echo model, * N*number of significant paths is

In Eq. (3),

In Eq. (4), A is the constant coefficient for frequency response adjustment, and M is the number of significant paths. The weighting factor corresponding to each significant path is

Finally, a 520 W coffee maker is added to make the overall load approximately 1500 W. In each case, the attenuation profile of the powerline is measured with the aid of the de-embedded E5071C VNA. The indoor powerline cable used for this experiment is the 10 AWG Romex SIMpull CU NM-B cable. To examine the effect of distance, the cable distance is extended in increments of 100 m. Initially, a 150 m Romex indoor cable is used, and attenuation profile is measured using the VNA when 500 W load is connected to the 150 m cable located in between two ends of the VNA (Figure 3). The loads are subsequently increased to 1000 and 1500 W, respectively. The Romex cable length is increased to 250 and 350 m, respectively, and the network loads and measurements are repeated. Results of the attenuation profiles are shown in Figures 4–6, respectively. From Figures 4 to 6, effects of powerline length are noticeable as the VNA shows a profile that is increasingly attenuated as PLC channel length is increasing. Also, electrical loads on the channel have significant effect on the attenuation profile. The attenuation profile in Figure 4 shows a channel that has more channel notches as more electrical loads are added to the network. When loads on the channel are only 500 W, the channel shows lesser number of notches than when loads on the channel increase to 1000 and to 1500 W, respectively. The more the loads, the more the number of notches. This indicates that when more electrical loads exist on the PLC channels, then data or signal sent on the channel will suffer increased attenuation than when less amounts of electrical loads exist on the network. Similar channel load effects are observed on the 250 m long and on the 350 m long PLC channels in Figures 5 and 6, respectively. However, the channel profile exhibits a characteristic similar to that described by the Zimmerman and Dostert PLC channel model. Thus, the Zimmerman and Dostert model is modified to show a narrowband channel model that considers the effect of PLC channel load. The resulting model which focuses primarily on the effect of electrical loads on the PLC channel at the PLC narrowband region is shown in Eq. (5), and it is simulated with Matlab and graphed in Figure 7. Figure 7, which is the derived model based on Eq. (5), is generally similar to the load-based PLC channel profile shown in Figures 4 to 6. In Eq. (5), μ is the channel load index where μ ε 1, 2, 3, ** …**.., n. To replicate the channel profile of Figure 7, the load factor μ can be increased from 1 to n based on discrete channel load increments of 200 W. For the purpose of clarity, in Eq. (5), N is the number of significant paths,

The precise and deterministic nature of the channel load index μ in Eq. (5) and the fact that the amount of PLC channel loads is directly related to the PLC channel impedance is exploited in this chapter to reduce the amount of data that can be used in machine learning ensemble algorithm to predict the PLC channel impedance. Our approach can thus be viewed as a deterministic data-optimization approach to PLC impedance prediction. In deterministic AI, data produced by systems whose behavior is governed by fundamental physical laws can utilize those laws for reducing data used in machine learning and other AI applications [24]. Deterministic AI methods have been successfully applied in large engineering systems such as altitude control of spaceships suffering from loss of vital parts [25]. It has also been applied for achieving better precision in AI algorithms [26] used for adaptive control of actuators [24, 27], in system identification [26], and in plant control [28], with results indicating that the deterministic AI method often leads to better precision in prediction performance and in reducing superfluous network data.

## 3. Machine learning ensemble regression method for PLC channel impedance estimation

PLC network impedance has significant effect on communication over powerline networks. The line impedance directly impacts the communication distance in an inverse relationship, i.e., the higher the line impedance, the lower the distance at which good communication can be achieved over the powerline. Also, if the powerline load impedance is lower than the PLC network transmitter impedance, then the load will provide an easier grounding pathway for the communication signal. The signal, thus, will get easily attenuated. Hence, due to the importance of impedance [29] to the success of communication over the powerline, it is essential that the impedance information of the PLC network is always available at the transmitter and PLC receiver ends [30, 31]. However, it is challenging to always predict the PLC network impedance in real time since electrical appliances are always being switched on and off, thus causing network impedance to vary always. In addition, VNA, PMUs, and other equipment useful for measuring PLC network impedance are always expensive, and thus, it is impossible to install such equipment at all possible nodes on the network for grid health monitoring. Hence, in this chapter, we have devised machine learning and deterministic data optimization-based approach by which PLC network impedance can be predicted using common PLC network load data. The machine learning approach adopted for this work is the use of random forest ensemble regression method.

In literature, different types of machine learning and artificial intelligence methods have been used for different types of engineering and large network problems [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]; however, the random forest ensemble regression method has been proven to be very useful since it is known to have high prediction accuracy, it is efficient on large datasets, and it also gives better predictive accuracy when there are cases on missing data [52, 53, 54].

Random forest is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class’s output by individual trees. Random forest works by training randomly selected subset of data from a large set of data on decision trees and then aggregating the results of each decision tree to form an ensemble result that often yields better prediction. In Figure 8, each decision tree is trained using a method called bootstrap aggregating.

At each split node and the resulting child nodes, another metric called the information gain which is the difference between the uncertainty of the starting node and the weighted impurity of the resulting two-child nodes is used to decide on which feature can be used to split the data. The combined use of the Gini index, information gain, bootstrap aggregation, and decision trees serves to make the random forest a valuable method for both classification- and regression-based predictions.

In our implementation of the random forest method for PLC impedance prediction, the de-embedded E5071C VNA is used to measure the network when different types and power ratings of electrical loads are plugged to the network of Figure 3. About 200, 400, 600, 800, and 1000 W loads of different ratings are separately plugged to the network, and PLC variables such as transfer function, frequency, phase, distance, and frequency data are obtained and used to predict PLC network impedance. Measurement data are obtained in Excel file format from the VNA, and random forest ensemble regression is used to predict PLC network impedance using those variables. Resulting Excel files are loaded onto an Ubuntu 18.04 Linux system. Python, including python libraries such as sklearn, pandas, and numpy, is used to import python-based random forest regression (ensemble-GradientBoostingRegressor) library to accomplish PLC impedance prediction. In each of the data set loaded onto the Linux system, the last column (to be predicted) by the ensemble random forest method is the PLC network impedance.

The random forest regressor parameters include splitting the training and test data in an 80:20 ratio, the number of estimators in each case is 500, and the maximum depth is 4. The learning rate of the network is 0.01. This rate is selected based on similar data rate selection in literature [32]. Since it is established in literature that random forest ensemble regression method works well for prediction efforts, our objectives include finding correct features of the PLC network that will yield the lowest possible MSE. Each data set exported from the de-embedded E5071C VNA to the Linux system contains 49,101 data points. Initially, only the frequency data and the transfer function of the 1000 W network load are used to predict PLC network impedance. A snapshot of the dataset and the predicted impedance yielded by the Linux system is shown in Table 1. A plot of the predicted impedance using only those two variables (frequency and 1000 W transfer function data) is shown in Figure 9.

Frequency (kHz) | Transfer function (1000 W loads) | Predicted impedance (Ω) |
---|---|---|

10 | −0.041888548 | 2.683132621 |

10.01 | −0.083776438 | 2.620954991 |

10.02 | −0.125664329 | 2.558777556 |

10.03 | −0.167552219 | 2.496600312 |

10.04 | −0.209440109 | 2.434423254 |

10.05 | −0.251327999 | 2.372246379 |

10.06 | −0.293215889 | 2.310069682 |

10.07 | −0.335103779 | 2.24789316 |

499.93 | −0.502655337 | 19.13619916 |

499.94 | −0.544543227 | 19.07860685 |

499.95 | −0.586431116 | 19.00921205 |

499.96 | −0.628319005 | 19.92970978 |

499.97 | −0.670206895 | 19.84181646 |

## 4. Results and discussion

The MSE of using only two PLC network variables to accomplish impedance prediction is 0.005. As observed in Figure 9 and from the MSE result, there is clearly an undesired effect of overfitting when only two features are used to predict the PLC network impedance. It can be seen that the fitted regression line (red) and the impedance prediction data (in blue) almost perfectly overlay each other. To improve on prediction accuracy, several PLC network parameters including transfer functions for 200, 400, 600, and 800 W, distance (150, 250, and 350 m), and phase data are measured and added to our prediction data. A snapshot of the new data used is shown in Table 2.

Frequency (kHz) | Tf. func (200 W) | Tf. func (400 W) | Tf. func (600 W) | Tf. func (800 W) | Tf. func (1000 W) | Tf func (1500 W) | Distance (m) | Phase (degree) | Predicted impedance |
---|---|---|---|---|---|---|---|---|---|

10 | −0.278762778 | −0.228976545 | −0.467543787 | −0.534524537 | −0.354669876 | −0.762554545 | 150 | −0.7540 | 2.496600312 |

10.01 | −0.245907689 | −0.378620012 | −0.290837890 | −0.657890436 | −0.876453095 | −0.879698379 | 150 | −0.8796 | 2.667887534 |

10 | −0.278762778 | −0.378400133 | −0.287910770 | −0.561996103 | −0.678567899 | −0.799721056 | 250 | −2.8903 | 2.356813698 |

10.01 | −0.300135667 | −0.333501255 | −0.315340097 | −0.559899856 | −0.668970778 | −0.789987231 | 250 | −2.7646 | 3.325678210 |

10 | −0.278762778 | −0.300198951 | −0.315487632 | −0.495908178 | −0.712089758 | −0.777180790 | 350 | −3.1416 | 2.996754013 |

10.01 | −0.278762778 | −0.343590108 | −0.456790799 | −0.466320126 | −0.787710967 | −0.899870018 | 350 | 3.0159 | 3.132459810 |

Results of using these data sets are shown from Figures 10 to 17. In Figure 10, it is observable that using 200, 400, 600, 800, and 1000 W transfer function data, frequency, phase, and 150 m distance data does not yield a very good impedance prediction result since the measured MSE is 59.59. In Figure 11, 250 m distance is added to the dataset that yielded result of Figure 10. It is also observed that the prediction result in this instance is poor since MSE is 59.02. Likewise, when 350 m data is added (Figure 12), the MSE is 59.17, indicating poor performance by the ensemble regression method. Next, all the distance data were removed, leaving only 200, 400, 600, 800, and 100 W transfer function, phase, and frequency data. Impedance prediction result is shown in Figure 13, and the MSE is 37.82. It is also observed in Figure 13 that the collective impedance result is approaching true values of between 17 and 25 Ω for home PLC impedance [17] as shown by the inserted blue ring. Prior results from Figures 10 to 12 do not yield such improved prediction.

To further optimize impedance prediction result using common PLC network data, only 200, 400, and 600 W, frequency, and phase data were used to predict impedance. The result of this is shown in Figure 14.

It is observable (using the inserted blue ring) that the impedance prediction is even better. The MSE for this result is 31.63. Figure 15 shows the result of using only 200 and 400 W, phase, and frequency data. The MSE in this instance is only 17.02. In Figure 16, only 400 W, frequency, and phase data were used for prediction, and the resulting MSE is 22.79. From the foregoing, it can be deduced that using two columns (200 and 400 W) of PLC network electrical load transfer function, frequency, and phase data works very well when random forest regression method is used for PLC network impedance prediction. To further test this deterministic hypothesis, a different set of 200 and 400 W load ratings are plugged into the PLC network, and the resulting impedance prediction shown in Figure 17 yielded only an MSE of 17.12.

## 5. Conclusion

In this chapter, a new set of attenuation profile result based on the load ratings of electrical devices existing on PLC network in the narrowband PLC frequency range has been obtained. The new result can be used to model the attenuation profile of the PLC network when the number and ratings of electrical loads on the network are considered. In addition, the random forest ensemble regression method is used to predict the PLC network impedance using commonly available PLC network data.

MSE result shows that using only four features including two columns of network load transfer functions, frequency, and PLC network phase data leads to optimized impedance prediction for the PLC network. Our result indicates that commonly available PLC network devices reinforced with deterministic data-optimization approach can be used for PLC impedance prediction. This is different from the state of the art, where very expensive devices are used for PLC network impedance measurement and prediction.

## Acknowledgments

The authors wish to recognize the contributions of Schweitzer Engineering Laboratory (SEL), Pullman Washington, by donating research equipment that in part facilitated this research work.