Rainfall Erosivity and Its Estimation: Conventional and Machine Learning Methods

Rainfall erosivity concerns the ability of rainfall to cause erosion on the surface of the earth. The difficulty in modeling the distribution, the size, and the terminal velocity of raindrops in relation to the detachment of soil particles led to the use of more tractable rainfall indices. Thus, in the universal soil loss equation (USLE), the coefficient of rainfall erosivity, R , was introduced. This coefficient is based on the product of the rainfall kinetic energy of a storm and its maximum 30-minute intensity. An important problem in the application of USLE and its revisions in various parts of the world concerns the computation of R , which requires pluviograph records with a length of at least 20 years. For this reason, empirical equations have been developed that are based on coarser rainfall data, such as daily, monthly, or yearly, which are available on larger spatial and temporal extents. However, the lack of denser data is dealt more effectively by means of machine learning methods. Computational systems for this purpose were recently developed based on feed-forward neural networks, yielding significantly better results.


Introduction
Soil erosion is the detachment, transport, and deposition of soil particles, and it takes place in the course of one or more processes. These processes are natural, such as rainfall with the energy that it carries, surface water runoff, wind, and gravity. The observed processes may be combined with human activities and works, such as intensive cultivations, deforestation, construction of public works, and mining activities [1].
Earth terrain is strongly influenced by erosion, which progresses in geological time, in case it is a purely natural process. Otherwise, when human activities and works are involved, the phenomenon of erosion is accelerated. This accelerated erosion may cause uncontrollable soil loss with negative consequences for human health, the natural ecosystems, the climate, as well as the economy.
The universal soil loss equation (USLE) with its revisions (RUSLE, RUSLE2) can be used to predict the average rate of soil erosion by grouping the numerous parameters that affect soil loss into a set of factors and is the prediction equation most widely used in the world [2]. According to Nearing et al. [3], "Soil loss refers to the amount of sediment that reaches the end of a specified area on a hillslope that is experiencing net loss of soil by water erosion." Development of soil loss equations began in 1940, and the universal soil loss equation was developed at the National Runoff and Soil Loss Data Center established in 1954 by the Science and Education Administration [4].
The USLE represents an erosion model developed for the prediction of soil losses in an average long-term sense. It is based on the knowledge of the physical characteristics of the field area under study, along with the prevailing cropping and management system. USLE has been widely tested in field conditions, and therefore its validity has been established.
USLE consists of the product of six factors, whose numerical values can be specified depending on a particular location. There is a considerable variation in the resulting erosion values, if observations are limited within short periods. However, long-term averages correspond more satisfactorily to predictions.
The soil loss equation is where the meaning of the symbols is given in [5] exactly as follows: • A is the computed soil loss per unit area, expressed in the units selected for K and for the period selected for R.
• R, the rainfall and runoff factor, is the number of rainfall erosion index units, plus a factor for runoff from snowmelt or applied water where such runoff is significant.
• K, the soil erodibility factor, is the soil loss rate per erosion index unit for a specified soil as measured on a unit plot, which is defined as a 72.6-ft length of uniform 9% slope continuously in clean-tilled fallow.
• L, the slope length factor, is the ratio of soil loss from the field slope length to that from a 72.6-ft length under identical conditions.
• S, the slope-steepness factor, is the ratio of soil loss from the field slope gradient to that from a 9% slope under otherwise identical conditions.
• C, the cover and management factor, is the ratio of soil loss from an area with specified cover and management to that from an identical area in tilled continuous fallow.
• P, the support practice factor, is the ratio of soil loss with a support practice like contouring, strip cropping, or terracing to that with straight-row farming up and down the slope.
Out of the six factors, only two have units, the erosivity R and the erodibility K. The remaining four, i.e., slope steepness, slope length, cropping management, and components of control practice, are dimensionless factors, because they represent ratios in relation to the unit plot [4].
Regarding R, it needs to be observed that its units stem from its definition as the energy multiplied by maximum 30-minute intensity term. In contrast, the erodibility factor values were determined empirically by calibration against measured erosion data. This is important due to the fact that if there is a change on the definition of the R factor, then the values of the K factor should be recalculated [3].

Computation of the R factor
The R factor is computed for time periods greater than 20 years, so that wet or dry rainfall periods will be incorporated, eliminating any bias. It is set equal to the average of the sum of the erosivity values for every year's rainfalls. The R coefficient is defined as the product of the kinetic energy of an erosive rainfall event by the maximum intensity of a 30-minute duration rainfall, during the rainfall event: where n is the number of years in the record, m j the number of erosive events during year j, and EI 30 (MJ mm ha À1 h À1 ) the rainfall erosivity for event k.
The erosivity of a particular event is where e r is the kinetic energy per unit of rainfall (MJ ha À1 mm À1 ); v r is the rainfall depth (mm) for the time interval r of the hyetograph, which has been divided into r ¼ 1, 2, …, m subintervals; and I 30 is the maximum rainfall intensity for a 30-minute duration.
For the computation of e r , numerous empirical relations ( Figure 1) involving rain intensity have been proposed [6]. Thus, in USLE, the empirical relation of Wischmeier and Smith [4] was used: with the upper limit of 0.283 MJ ha À1 mm À1 if i>76 mm h À1 , where i is rainfall intensity. Later, in RUSLE [5], the exponential relation of Brown and Foster [7] was used: e r ¼ 0:29 1 À 0:72 e À0:05i À Á Upon comparison of the equations used in USLE and RUSLE, McGregor and Mutchler [8] found out that for rain intensities up to 35 mm/h, Eq. (5) yields results by 12% less than those of Eq. (4), and they proposed a modification in the value 0.05 that controls the rate of change of e r with i. Thus, in RUSLE, the following relation has been adopted: e r ¼ 0:29 1 À 0:72 e À0:082i À Á Eq. (5) was developed for an application concerning reordered rainfall intensity data and not natural rainfall data [6]. The systematic underestimation of R has been shown in many other studies [3,[7][8][9], so Eq. (5) should not be used for calculations. The rules that apply in order to single out the storms causing erosion and to divide rainfalls of large duration in RUSLE2 [10,11] are: 1. A rainfall event is divided into two parts, if its cumulative depth for duration of 6 hours at a certain location is less than 1.27 mm.

2.
A rainfall is considered erosive if it has a cumulative value greater than 12.7 mm.
3. All rainfalls with extreme EI 30 values and a return period greater than 50 years are deleted.
The current revised version of USLE, RUSLE2 [10], introduced the erosivity density (ED), as a measure of rainfall erosivity per unit rainfall to develop erosivity values for the USA, because ED requires shorter record lengths, as 10 years leads to acceptable results, allows more missing data than R, and is independent of the elevation: where m j is the number of storms during a time period j, EI 30 ð Þ k the erosivity of storm k, and P j the total precipitation height for the period j (usually monthly or annual).
Vantas et al. [12] using a numerical scheme with data from Greece showed that ED is more robust against the presence of missing precipitation values, as reflected in the following results: • Using only 5% of the data, annual R values are underestimated on average by 85%, when the average estimation error of ED values is 50%.
• In the presence of 50% of the data, R values are underestimated by 49%, while at the same time, the estimation error of ED is 20%.
The use of ED permits the utilization of data consisting of daily rainfall values that are more abundant in comparison to data from pluviographs. The details of the method, which was employed in the USA in combination with kriging, may be found in the RUSLE2 Science Documentation [10].

Methods for the estimation of R
The lack of dense time series from pluviographs gives rise to many difficulties in the application of USLE and of its revisions in many countries. In order to cope with these difficulties, many models have been developed, based on rainfall depth values for various time steps (daily, monthly, and yearly), under specific spatial parameters and climatological data. Thus, a number of researchers reported good correlation results between R and yearly rainfall in many countries of the world by using various schemes ranging from simple parametric equations to geostatistical models.
A different model followed in several countries concerns the correlation, by the use of parametric equations, of the yearly values of R to the monthly values of rainfall depth. Those equations were applied in Venezuela [27], Germany [28], the USA [15], Italy [29], Iran in combination with the yearly and the maximum daily rainfall per year [30], North Africa [31], Morocco [32], Nigeria [33], South Italy and Southeast Australia [34], Uruguay [35], and Sudan [36].
Finally, for the estimation of the yearly value of R, several researchers used parametric equations for the initial determination of the daily R value and the corresponding daily rainfall depths. Those equations were applied in the Eastern USA [37], Australia [38], Spain [17], Kenya [39], China [40], Nigeria [41], North Africa [42], South Italy [43], Peru [44], and Slovenia [45].
Indicatively, some of the empirical equations of the literature are given below in dimensionless form, representing various parametrizations and methodologies. In West African countries, Roose [13] developed a simple linear relation between yearly values of R and rainfall P: where α 1 και β 1 are linear regression parameters.
In the USA, Renard and Freimund [15] proposed the following nonlinear equations for continental regions of the country, in which no rainfall intensity data exist, also for yearly values of R and rainfall P: where α 2 , β 2 , α 3 , β 3 , and γ 3 are nonlinear regression parameters.
In Morocco, Arnoldus [32] used the modified Fournier index, which is equal to where P is the yearly rainfall and p m, i the monthly rainfall from January to December, and developed the formula where α 5 and β 5 are nonlinear regression parameters. Richardson et al. [37] used the daily rainfall values for the estimation of the corresponding daily erosivity values R d : where the parameters α m,s and β m,s correspond to station s and to month m and are evaluated by means of nonlinear regression.
A serious issue with the applicability of Eq. (13) is the difficulty to compute its coefficients in dry areas or months due to small number of erosive events. Another issue, even in wet periods, is the amount of missing values that also leads to a small set of observations and the same problems [46].
In order to reduce the computation burden of Eq. (13), Yu and Rosewell [38] proposed the following empirical equation: where the parameters α s , β s , and γ s are different only for the different stations s and ω is the month with the highest average of daily R values.
It must be noted that the logarithmic transformation and the subsequent application of linear regression must be avoided in the above equations, because in that case the minimization is applied to the average of the differences of the logarithms of the aggregated daily EI 30 values (coming from pluviograph data) and the logarithms of the estimated R d values (coming from daily rainfall data) and not of the corresponding absolute values themselves. Specifically, the application of the log transformation to Eq. (13) resulted in a À10% systematic error, when data from RUSLE2 was used, as noted on p. 59 of [10].
A different approach for the estimation of R, in the presence of missing values, was followed by Vantas and Sidiropoulos [47]. They compared empirical equations to machine learning methods, extracting for the latter methods better results, in terms of statistical significance, compared to the former methods.

Data
The data utilized in the analysis were taken from the Greek National Bank of Hydrological and Meteorological Information [48] and came from 84 meteorological stations (Figure 2). By adding up the years of record registered on the various stations, the time series comprised a total of 2425 years of pluviograph records, with an average of 28.9 years per station. The time step was 30 min for the time period from 1953 to 1997, and the data coverage was equal to 56%.
For the above described data, it was deemed useful to change the time step and aggregate the data into weekly values, because 57% of the recordings were associated with storms occupying time periods covering parts of more than one calendar day, although only 17% of the storms had duration of more than 24 hours.
Under the time step of 1 week, it was found out that 80% of the values emanated from a single storm. The storms that were crossed temporally by 2 consecutive weeks were assigned to the first of the 2 weeks, and they comprised only 7% of the data. Thus, through the use of weekly instead of daily values, divisions of storms due to time step were prevented, and a measure of rainfall duration in days has been added to the observations. Table 1 gives the concise statistics of the cumulative rainfall, the duration, and the erosivity EI 30 of the erosive events that resulted following RUSLE2 rules.

Exploratory data analysis
In the sequel, a series of three figures is given that contain: 1. The relation of the precipitation height of the erosive events to the EI 30 value in Figure 3, where a linear relation appears with a wide opening between the logarithms of rainfall values and the logarithms of EI 30 . Figure 4, where it turns out that in July its maximum appears with an almost sine-like variation of the median values per month.  3. The relation of the duration of erosive events to EI 30 in Figure 5, where it is shown that for the selected duration widths, the relation to the median values of EI 30 is parabolic.

The relation of EI 30 to the months in
In these three figures, local polynomial regression fitting (LOESS, [49]) was used as a nonparametric method to produce smooth curves through the plotted variables. In that method, the fitting is done locally for a given point x using all the points in the neighborhood of x and the distances of these points from x as weights.
From the above diagrams and from the statistics of the erosive events, it can be seen that the estimation of EI 30 values is a nonlinear problem with skewed data and with expected results of a relatively low accuracy. This point is particularly reinforced by Figure 3, in which it is shown that for a specific rainfall depth, such as   30 values, from about 10 to 700 MJ mm ha À1 h À1 , is to be assigned, corresponding to the erosive events connected to 30 mm.

Methods
The outcomes, denoted as y i ð Þ , where i ð Þ is the index of a sample consisting of 19,688 values, represented weekly cumulative rainfall erosivity as calculated from pluviograph data. The features included the weekly cumulative rainfall of erosive events, the month to which the above individual m values are referred, the altitude of the meteorological stations, and the number of the days of the week for which rainfall is recorded. The matrix of the features is denoted as x (i) , while h(x (i) ) denotes the output vector corresponding to the hypothesis in question (i.e., as computed either from the neural network or the empirical equations). The subscripts test and train mark quantities that belong to the testing or the training set, respectively.
The erosivity estimation problem is set up as a scheme of machine learning. The data were split using 70% of them as the training set and 30% as a testing one. As measures of the out-of-sample error were used: (a) The coefficient of determination: with this form of R 2 : • R 2 ¼ 0 means that there is no improvement comparing to a simplistic model that returns the average of the training set valuesŷ train .
• R 2 < 0 means an algorithm that makes prediction worse than the base model. (b) The root mean-squared error: (c) The mean absolute error: In Eqs. (15)- (17), m is the number of the samples in the test set, h x test i ð Þ À Á is the estimations from a trained algorithm (i.e., either from a trained neural network or from a fitted empirical equation) used as inputs, the testing set x test and y test are the outcomes from calculations by means of pluviograph data, andŷ train is the average of the training set erosivity values coming also from pluviograph data.
In order to compare neural network performance to that of empirical equations, two alternative exponential models discussed in the previous section were tried, namely, those given by Eqs. (13) and (14) and leading to optimal adjustments of two respective hypotheses. The latter were determined by minimizing the following objective function by the trust-region-reflective method [50,51]: where θ denotes the vector of the respective parameters and y (i) are the outcomes, as defined in the beginning of this section.
Neural networks aim at mimicking the function of the brain. They constitute powerful nonlinear regression methods. The result or exit of the neural network is produced via a series of intermediate nodes arranged in successive layers, characterized as hidden. These layers mediate between an initial layer of input nodes and a final layer of exit nodes. The hidden nodes define linear combinations of the data contained in the initial input layer. These linear combinations are further transformed by a nonlinear function, which possesses a continuous derivative, such as the hyperbolic tangent function: The example in Figure 6 displays three layers, of which the one to the left is the input layer, the middle one is the hidden layer, and the third one is the output layer.
In the neural network, α j ð Þ i denotes the activation of node i in layer j, and Θ j ð Þ denotes the matrix of weights that serve as coefficients of the linear transformation from layer j to layer j þ 1. The values at the output nodes in the example of Figure 6 are: The optimization of the Θ j ð Þ weights is executed in such a way as to minimize the sum of squares of the differences between computed and measured results: where m is the number of values h Θ x i ð Þ À Á obtained from the network using x i ð Þ as input data and y i ð Þ as the calculated values (i.e., the EI 30 values as computed from the pluviographic data).
This minimization is a difficult optimization problem, since there are no constraints related to these parameters. The parameters are usually initialized by random values, and in the sequel, specialized algorithms are used for their determination. A survey about neural networks and their forms as shallow and deep can be found in Schmidhuber [52] and Goodfellow et al. [53].
The architecture of the neural network used in the analysis included two hidden layers that had 32 and 16 neurons, respectively. In order to keep h Θ x i ð Þ À Á nonnegative, the second hidden layer of the neural network used the rectifier activation function [54]: The training set values x train were used to compute the averages x and standard deviation sd x ð Þ and normalize both x train and x test using the normalizing transformation: The training of the networks was performed by the method of early stopping [55] by utilizing a random validation set consisting of 10% of the training data, so as to avoid over fitting of the neural network.
The data importing, calculation of EI 30 values, and analysis were done using the R language for statistical computing and graphics [56] using the packages: hydroscoper [57], hyetor [58], and ggplot2 [59]. The open source software library TensorFlow [60] was used in order to train the neural network.

Results
The comparison of the algorithms was based on their performance on the testing set using the three different error metrics and two time steps: weekly and annual.   Table 3.
Estimation of the out-of-sample error metrics for the two empirical equations and the neural network. The weekly and the annual calculated values were produced from the aggregated erosivity values coming from the pluviograph data. The annual predicted values are also the aggregated values coming from the output of the corresponding algorithm (the predicted weekly erosivity values in our case). The annual time step was used in order to examine if there is a systematic error on the estimations of the algorithms, which was not the case. These results are given in Tables 2 and 3, where it is shown that the neural network outperformed the two parametric methods, which both had similar results. Specifically, the machine learning method had better performance that ranged from 8 to 22%, depending on the error metric and time step.
In Figures 7 and 8, the annual erosivity values are presented as calculated in the test set versus the values estimated from the 84 fitted Yu and Rosewell equations for each station and the single neural network. In these two diagrams, the neural network's estimations are closer to the calculated ones than those coming from the empirical equations.

Conclusions
The universal soil loss equation with its revisions represents the established model and the means for estimating annual long-term average soil loss. An important factor of this equation is the rainfall erosivity, R, which is closely connected to the energy carried by rainfall. Rainfall data are essential for the estimation of R. In particular, data from pluviographs are more suitable for that task. However, such data are scarce, and various empirical methods have been presented for estimating R from coarser data. These methods are reviewed in this chapter. Also, various difficulties are discussed that are associated with the issues of missing values and the separation of erosive events. In addition, indications are given of the adverse nature of the relation of erosivity to precipitation characteristics. All these facts raise objections to the suitability of empirical models and lead to the use of data-driven methods that will be able to handle more effectively the problematic nature of the data and also to discover more reliably the hidden relations between erosivity, precipitation, seasonality, and local characteristics. This is achieved, because in machine learning procedures, no human intervention or bias is involved in the selection or in the forms of these hidden nonlinearities.
More specifically, neural networks, as machine learning tools, are employed in this chapter for the estimation of erosivity, in comparison to two different empirical equations of the literature. The results demonstrate, first of all, the superior performance of the machine learning method and its ability for generalization, so as to perform equally well upon new data. Another advantage of the proposed learning algorithm is that it produces a single model accessing all available data, in contrast to empirical equations, where a researcher must fit and handle as many nonlinear equations as the number of available stations or the product of the number of the stations by the number of the used seasons, making the analysis cumbersome and prone to errors.
The latter fact is exemplified on the country-wide data from Greece, used in this chapter. The single neural network, which was produced, outperformed 84 different empirical nonlinear equations, each one of which was fitted using data from each station. The comparison of the performance was made using test data sets to which neither the empirical equations nor the neural network had any access during their training.
In conclusion, the present chapter gives an indicative and characteristic sample of the use of machine learning methods in the problems associated with erosivity. There are still many related problems, such as the spatial and temporal distribution of erosivity, with significant potential for the application of machine learning methods. The same is true of many areas of hydrology and water resources.

Conflict of interest
The authors declare no conflict of interest.