Open access peer-reviewed chapter - ONLINE FIRST

Rainfall Erosivity and Its Estimation: Conventional and Machine Learning Methods

By Konstantinos Vantas, Epaminondas Sidiropoulos and Chris Evangelides

Submitted: November 30th 2018Reviewed: March 19th 2019Published: June 28th 2019

DOI: 10.5772/intechopen.85937

Downloaded: 76

Abstract

Rainfall erosivity concerns the ability of rainfall to cause erosion on the surface of the earth. The difficulty in modeling the distribution, the size, and the terminal velocity of raindrops in relation to the detachment of soil particles led to the use of more tractable rainfall indices. Thus, in the universal soil loss equation (USLE), the coefficient of rainfall erosivity, R, was introduced. This coefficient is based on the product of the rainfall kinetic energy of a storm and its maximum 30-minute intensity. An important problem in the application of USLE and its revisions in various parts of the world concerns the computation of R, which requires pluviograph records with a length of at least 20 years. For this reason, empirical equations have been developed that are based on coarser rainfall data, such as daily, monthly, or yearly, which are available on larger spatial and temporal extents. However, the lack of denser data is dealt more effectively by means of machine learning methods. Computational systems for this purpose were recently developed based on feed-forward neural networks, yielding significantly better results.

Keywords

  • rainfall
  • erosivity
  • machine learning
  • erosivity density
  • universal soil loss equation
  • nonlinear regression
  • neural networks

1. Introduction

Soil erosion is the detachment, transport, and deposition of soil particles, and it takes place in the course of one or more processes. These processes are natural, such as rainfall with the energy that it carries, surface water runoff, wind, and gravity. The observed processes may be combined with human activities and works, such as intensive cultivations, deforestation, construction of public works, and mining activities [1].

Earth terrain is strongly influenced by erosion, which progresses in geological time, in case it is a purely natural process. Otherwise, when human activities and works are involved, the phenomenon of erosion is accelerated. This accelerated erosion may cause uncontrollable soil loss with negative consequences for human health, the natural ecosystems, the climate, as well as the economy.

The universal soil loss equation (USLE) with its revisions (RUSLE, RUSLE2) can be used to predict the average rate of soil erosion by grouping the numerous parameters that affect soil loss into a set of factors and is the prediction equation most widely used in the world [2]. According to Nearing et al. [3], “Soil loss refers to the amount of sediment that reaches the end of a specified area on a hillslope that is experiencing net loss of soil by water erosion.” Development of soil loss equations began in 1940, and the universal soil loss equation was developed at the National Runoff and Soil Loss Data Center established in 1954 by the Science and Education Administration [4].

The USLE represents an erosion model developed for the prediction of soil losses in an average long-term sense. It is based on the knowledge of the physical characteristics of the field area under study, along with the prevailing cropping and management system. USLE has been widely tested in field conditions, and therefore its validity has been established.

USLE consists of the product of six factors, whose numerical values can be specified depending on a particular location. There is a considerable variation in the resulting erosion values, if observations are limited within short periods. However, long-term averages correspond more satisfactorily to predictions.

The soil loss equation is

A=R·K·L·S·C·PE1

where the meaning of the symbols is given in [5] exactly as follows:

  • Ais the computed soil loss per unit area, expressed in the units selected for K and for the period selected for R.

  • R, the rainfall and runoff factor, is the number of rainfall erosion index units, plus a factor for runoff from snowmelt or applied water where such runoff is significant.

  • K, the soil erodibility factor, is the soil loss rate per erosion index unit for a specified soil as measured on a unit plot, which is defined as a 72.6-ft length of uniform 9% slope continuously in clean-tilled fallow.

  • L, the slope length factor, is the ratio of soil loss from the field slope length to that from a 72.6-ft length under identical conditions.

  • S, the slope-steepness factor, is the ratio of soil loss from the field slope gradient to that from a 9% slope under otherwise identical conditions.

  • C, the cover and management factor, is the ratio of soil loss from an area with specified cover and management to that from an identical area in tilled continuous fallow.

  • P, the support practice factor, is the ratio of soil loss with a support practice like contouring, strip cropping, or terracing to that with straight-row farming up and down the slope.

Out of the six factors, only two have units, the erosivity R and the erodibility K. The remaining four, i.e., slope steepness, slope length, cropping management, and components of control practice, are dimensionless factors, because they represent ratios in relation to the unit plot [4].

Regarding R, it needs to be observed that its units stem from its definition as the energy multiplied by maximum 30-minute intensity term. In contrast, the erodibility factor values were determined empirically by calibration against measured erosion data. This is important due to the fact that if there is a change on the definition of the R factor, then the values of the K factor should be recalculated [3].

2. Computation of the R factor

The R factor is computed for time periods greater than 20 years, so that wet or dry rainfall periods will be incorporated, eliminating any bias. It is set equal to the average of the sum of the erosivity values for every year’s rainfalls. The R coefficient is defined as the product of the kinetic energy of an erosive rainfall event by the maximum intensity of a 30-minute duration rainfall, during the rainfall event:

R=1nj=1nk=1mjEI30kE2

where nis the number of years in the record, mjthe number of erosive events during year j, and EI30(MJ mm ha−1 h−1) the rainfall erosivity for event k.

The erosivity of a particular event is

EI30=r=1mer·vr·I30E3

where eris the kinetic energy per unit of rainfall (MJ ha−1 mm−1); vris the rainfall depth (mm) for the time interval rof the hyetograph, which has been divided into r=1,2,,msubintervals; and I30is the maximum rainfall intensity for a 30-minute duration.

For the computation of er, numerous empirical relations (Figure 1) involving rain intensity have been proposed [6]. Thus, in USLE, the empirical relation of Wischmeier and Smith [4] was used:

Figure 1.

The three different kinetic energy equations used in USLE and its revisions. The points are actual data, showing the relation between intensity and the kinetic energy of the rainfall, converted to SI units and coming from Haan et al. [61].

er=0.119+0.0873log10iE4

with the upper limit of 0.283 MJ ha−1mm−1 if i>76 mm h−1, where iis rainfall intensity.

Later, in RUSLE [5], the exponential relation of Brown and Foster [7] was used:

er=0.2910.72e0.05iE5

Upon comparison of the equations used in USLE and RUSLE, McGregor and Mutchler [8] found out that for rain intensities up to 35 mm/h, Eq. (5) yields results by 12% less than those of Eq. (4), and they proposed a modification in the value 0.05 that controls the rate of change of er with i. Thus, in RUSLE, the following relation has been adopted:

er=0.2910.72e0.082iE6

Eq. (5) was developed for an application concerning reordered rainfall intensity data and not natural rainfall data [6]. The systematic underestimation of R has been shown in many other studies [3, 7, 8, 9], so Eq. (5) should not be used for calculations. The rules that apply in order to single out the storms causing erosion and to divide rainfalls of large duration in RUSLE2 [10, 11] are:

  1. A rainfall event is divided into two parts, if its cumulative depth for duration of 6 hours at a certain location is less than 1.27 mm.

  2. A rainfall is considered erosive if it has a cumulative value greater than 12.7 mm.

  3. All rainfalls with extreme EI30values and a return period greater than 50 years are deleted.

The current revised version of USLE, RUSLE2 [10], introduced the erosivity density (ED), as a measure of rainfall erosivity per unit rainfall to develop erosivity values for the USA, because ED requires shorter record lengths, as 10 years leads to acceptable results, allows more missing data than R, and is independent of the elevation:

EDj=k=1mjEI30kPjE7

where mjis the number of storms during a time period j, EI30kthe erosivity of storm k, and Pjthe total precipitation height for the period j(usually monthly or annual).

Vantas et al. [12] using a numerical scheme with data from Greece showed that ED is more robust against the presence of missing precipitation values, as reflected in the following results:

  • Using only 5% of the data, annual R values are underestimated on average by 85%, when the average estimation error of ED values is 50%.

  • In the presence of 50% of the data, R values are underestimated by 49%, while at the same time, the estimation error of ED is 20%.

The use of ED permits the utilization of data consisting of daily rainfall values that are more abundant in comparison to data from pluviographs. The details of the method, which was employed in the USA in combination with kriging, may be found in the RUSLE2 Science Documentation [10].

3. Methods for the estimation of R

The lack of dense time series from pluviographs gives rise to many difficulties in the application of USLE and of its revisions in many countries. In order to cope with these difficulties, many models have been developed, based on rainfall depth values for various time steps (daily, monthly, and yearly), under specific spatial parameters and climatological data. Thus, a number of researchers reported good correlation results between R and yearly rainfall in many countries of the world by using various schemes ranging from simple parametric equations to geostatistical models.

These methods were applied in the countries of West Africa [13]; Switzerland taking account of elevation, aspect, longitude, and altitude [14]; the USA [15]; India [16]; Spain [17]; Italy [18]; China taking account of maximal rainfalls of 1-hour and 1-day durations per year [19]; Korea [20]; Mexico [21]; Honduras in combination with elevation [22]; Rhodesia [23]; and Hawaii [24] as well as for the development of a European [25] and Universal model [26].

A different model followed in several countries concerns the correlation, by the use of parametric equations, of the yearly values of R to the monthly values of rainfall depth. Those equations were applied in Venezuela [27], Germany [28], the USA [15], Italy [29], Iran in combination with the yearly and the maximum daily rainfall per year [30], North Africa [31], Morocco [32], Nigeria [33], South Italy and Southeast Australia [34], Uruguay [35], and Sudan [36].

Finally, for the estimation of the yearly value of R, several researchers used parametric equations for the initial determination of the daily R value and the corresponding daily rainfall depths. Those equations were applied in the Eastern USA [37], Australia [38], Spain [17], Kenya [39], China [40], Nigeria [41], North Africa [42], South Italy [43], Peru [44], and Slovenia [45].

Indicatively, some of the empirical equations of the literature are given below in dimensionless form, representing various parametrizations and methodologies. In West African countries, Roose [13] developed a simple linear relation between yearly values of R and rainfall P:

R=α1+β1·PE8

where α1και β1are linear regression parameters.

In the USA, Renard and Freimund [15] proposed the following nonlinear equations for continental regions of the country, in which no rainfall intensity data exist, also for yearly values of R and rainfall P:

R=α2·Pβ2E9

or

R=α3+β3·Pγ3E10

where α2, β2, α3, β3, and γ3are nonlinear regression parameters.

In Morocco, Arnoldus [32] used the modified Fournier index, which is equal to

MFI=i=112pm,iPE11

where P is the yearly rainfall and pm,ithe monthly rainfall from January to December, and developed the formula

R=α5·MFIβ5E12

where α5 and β5 are nonlinear regression parameters.

Richardson et al. [37] used the daily rainfall values for the estimation of the corresponding daily erosivity values Rd:

Rd=αm,s·Pdβm,sE13

where the parameters αm,s and βm,s correspond to station s and to month m and are evaluated by means of nonlinear regression.

A serious issue with the applicability of Eq. (13) is the difficulty to compute its coefficients in dry areas or months due to small number of erosive events. Another issue, even in wet periods, is the amount of missing values that also leads to a small set of observations and the same problems [46].

In order to reduce the computation burden of Eq. (13), Yu and Rosewell [38] proposed the following empirical equation:

Rd=αs·1+βs·cosπ6mωPdγsE14

where the parameters αs, βs, and γs are different only for the different stations s and ω is the month with the highest average of daily R values.

It must be noted that the logarithmic transformation and the subsequent application of linear regression must be avoided in the above equations, because in that case the minimization is applied to the average of the differences of the logarithms of the aggregated daily EI30values (coming from pluviograph data) and the logarithms of the estimated Rd values (coming from daily rainfall data) and not of the corresponding absolute values themselves. Specifically, the application of the log transformation to Eq. (13) resulted in a −10% systematic error, when data from RUSLE2 was used, as noted on p. 59 of [10].

A different approach for the estimation of R, in the presence of missing values, was followed by Vantas and Sidiropoulos [47]. They compared empirical equations to machine learning methods, extracting for the latter methods better results, in terms of statistical significance, compared to the former methods.

4. Utilization of neural networks for the estimation of R

4.1 Data

The data utilized in the analysis were taken from the Greek National Bank of Hydrological and Meteorological Information [48] and came from 84 meteorological stations (Figure 2). By adding up the years of record registered on the various stations, the time series comprised a total of 2425 years of pluviograph records, with an average of 28.9 years per station. The time step was 30 min for the time period from 1953 to 1997, and the data coverage was equal to 56%.

Figure 2.

Stations’ location coming from Greece used in the analysis.

For the above described data, it was deemed useful to change the time step and aggregate the data into weekly values, because 57% of the recordings were associated with storms occupying time periods covering parts of more than one calendar day, although only 17% of the storms had duration of more than 24 hours.

Under the time step of 1 week, it was found out that 80% of the values emanated from a single storm. The storms that were crossed temporally by 2 consecutive weeks were assigned to the first of the 2 weeks, and they comprised only 7% of the data. Thus, through the use of weekly instead of daily values, divisions of storms due to time step were prevented, and a measure of rainfall duration in days has been added to the observations.

4.2 Exploratory data analysis

Table 1 gives the concise statistics of the cumulative rainfall, the duration, and the erosivity EI30of the erosive events that resulted following RUSLE2 rules.

VariableMinMeanMedianMaxSDSWKRCV
EI30 (MJ mm ha−1 h−1)0.789.144.23845.0157.47.6101.01.7
Precipitation (mm)12.729.422.2322.421.93.419.80.7
Duration (h)0.513.010.5152.510.52.513.60.8

Table 1.

Average statistical properties of cumulative rainfall, the duration and EI30of the erosive events.

SD is an abbreviation for standard deviation, SW for skew, KR for kurtosis, and CV for coefficient of variation.

In the sequel, a series of three figures is given that contain:

  1. The relation of the precipitation height of the erosive events to the EI30value in Figure 3, where a linear relation appears with a wide opening between the logarithms of rainfall values and the logarithms of EI30.

  2. The relation of EI30to the months in Figure 4, where it turns out that in July its maximum appears with an almost sine-like variation of the median values per month.

  3. The relation of the duration of erosive events to EI30in Figure 5, where it is shown that for the selected duration widths, the relation to the median values of EI30is parabolic.

Figure 3.

Relation of cumulative precipitation of the erosive events to the EI30. The red line is the LOESS line.

Figure 4.

Relation of cumulative precipitation of the erosive events to the EI30. The red line is the LOESS line that is fitted using the median values of EI30 per month.

Figure 5.

Relation of cumulative precipitation of the erosive events to the EI30. The red line is the LOESS line that is fitted using the median values of EI30 per duration width.

In these three figures, local polynomial regression fitting (LOESS, [49]) was used as a nonparametric method to produce smooth curves through the plotted variables. In that method, the fitting is done locally for a given point x using all the points in the neighborhood of x and the distances of these points from x as weights.

From the above diagrams and from the statistics of the erosive events, it can be seen that the estimation of EI30values is a nonlinear problem with skewed data and with expected results of a relatively low accuracy. This point is particularly reinforced by Figure 3, in which it is shown that for a specific rainfall depth, such as 30 mm, a very large range of EI30values, from about 10 to 700 MJ mm ha−1 h−1, is to be assigned, corresponding to the erosive events connected to 30 mm.

4.3 Methods

The outcomes, denoted as yi, where iis the index of a sample consisting of 19,688 values, represented weekly cumulative rainfall erosivity as calculated from pluviograph data. The features included the weekly cumulative rainfall of erosive events, the month to which the above individual m values are referred, the altitude of the meteorological stations, and the number of the days of the week for which rainfall is recorded. The matrix of the features is denoted as x(i), while h(x(i)) denotes the output vector corresponding to the hypothesis in question (i.e., as computed either from the neural network or the empirical equations). The subscripts test and train mark quantities that belong to the testing or the training set, respectively.

The erosivity estimation problem is set up as a scheme of machine learning. The data were split using 70% of them as the training set and 30% as a testing one. As measures of the out-of-sample error were used:

  1. The coefficient of determination:

    R2=1i=1mhxtestiytesti2i=1mŷtrainytesti2E15

with this form of R2:

  • R2=0means that there is no improvement comparing to a simplistic model that returns the average of the training set values ŷtrain.

  • R2=1means a perfect algorithm.

  • R2<0means an algorithm that makes prediction worse than the base model.

  • The root mean-squared error:

    RMSE=1mi=1mhxtestiytesti2E16

  • The mean absolute error:

    MAE=1mi=1mhxtestiytestiE17

  • In Eqs. (15)(17), mis the number of the samples in the test set, hxtestiis the estimations from a trained algorithm (i.e., either from a trained neural network or from a fitted empirical equation) used as inputs, the testing set xtestand ytestare the outcomes from calculations by means of pluviograph data, and ŷtrainis the average of the training set erosivity values coming also from pluviograph data.

    In order to compare neural network performance to that of empirical equations, two alternative exponential models discussed in the previous section were tried, namely, those given by Eqs. (13) and (14) and leading to optimal adjustments of two respective hypotheses. The latter were determined by minimizing the following objective function by the trust-region-reflective method [50, 51]:

    Jθ=12mi=1mhθxiyi2E18

    where θdenotes the vector of the respective parameters and y(i) are the outcomes, as defined in the beginning of this section.

    Neural networks aim at mimicking the function of the brain. They constitute powerful nonlinear regression methods. The result or exit of the neural network is produced via a series of intermediate nodes arranged in successive layers, characterized as hidden. These layers mediate between an initial layer of input nodes and a final layer of exit nodes. The hidden nodes define linear combinations of the data contained in the initial input layer. These linear combinations are further transformed by a nonlinear function, which possesses a continuous derivative, such as the hyperbolic tangent function:

    gx=tanhxE19

    The example in Figure 6 displays three layers, of which the one to the left is the input layer, the middle one is the hidden layer, and the third one is the output layer. In the neural network, αijdenotes the activation of node iin layer j, and Θjdenotes the matrix of weights that serve as coefficients of the linear transformation from layer jto layer j+1. The values at the output nodes in the example of Figure 6 are:

    Figure 6.

    A neural network with a single hidden layer.

    α12=gΘ101·x0+Θ111·x1+Θ121·x2+Θ131·x3
    α22=gΘ201·x0+Θ211·x1+Θ221·x2+Θ231·x3
    α32=gΘ301·x0+Θ311·x1+Θ321·x2+Θ331·x3
    α42=gΘ401·x0+Θ411·x1+Θ421·x2+Θ431·x3
    α13=gΘ102·α02+Θ112·α12+Θ122·α22+Θ132·α32+Θ142·α42E20

    The optimization of the Θjweights is executed in such a way as to minimize the sum of squares of the differences between computed and measured results:

    JΘ=12mi=1mhΘxiyi2E21

    where m is the number of values hΘxiobtained from the network using xias input data and yias the calculated values (i.e., the EI30values as computed from the pluviographic data).

    This minimization is a difficult optimization problem, since there are no constraints related to these parameters. The parameters are usually initialized by random values, and in the sequel, specialized algorithms are used for their determination. A survey about neural networks and their forms as shallow and deep can be found in Schmidhuber [52] and Goodfellow et al. [53].

    The architecture of the neural network used in the analysis included two hidden layers that had 32 and 16 neurons, respectively. In order to keep hΘxinonnegative, the second hidden layer of the neural network used the rectifier activation function [54]:

    gx=max0xE22

    The training set values xtrainwere used to compute the averages x̄and standard deviation sdxand normalize both xtrainand xtestusing the normalizing transformation:

    Nxi=xix¯sdxE23

    The training of the networks was performed by the method of early stopping [55] by utilizing a random validation set consisting of 10% of the training data, so as to avoid over fitting of the neural network.

    The data importing, calculation of EI30values, and analysis were done using the R language for statistical computing and graphics [56] using the packages: hydroscoper [57], hyetor [58], and ggplot2 [59]. The open source software library TensorFlow [60] was used in order to train the neural network.

    4.4 Results

    The comparison of the algorithms was based on their performance on the testing set using the three different error metrics and two time steps: weekly and annual. The weekly and the annual calculated values were produced from the aggregated erosivity values coming from the pluviograph data. The annual predicted values are also the aggregated values coming from the output of the corresponding algorithm (the predicted weekly erosivity values in our case). The annual time step was used in order to examine if there is a systematic error on the estimations of the algorithms, which was not the case.

    These results are given in Tables 2 and 3, where it is shown that the neural network outperformed the two parametric methods, which both had similar results. Specifically, the machine learning method had better performance that ranged from 8 to 22%, depending on the error metric and time step.

    MethodRMSEMAER2
    Richardson et al.141.9465.460.59
    Yu and Rosewell140.3664.330.60
    Neural network116.8354.330.73

    Table 2.

    Estimation of the out-of-sample error metrics for the two empirical equations and the neural network.

    R2 values are unitless and RMSE and MAE values are in MJ mm ha−1 h−1. The metrics refer to the weekly erosivity time step.

    MethodRMSEMAER2
    Richardson et al.261.73144.630.76
    Yu and Rosewell259.80141.480.76
    Neural network223.01124.240.82

    Table 3.

    Estimation of the out-of-sample error metrics for the two empirical equations and the neural network.

    R2 values are unitless and RMSE and MAE values are in MJ mm ha−1 h−1y−1. The values are calculated using the estimation of the annual erosivity values.

    In Figures 7 and 8, the annual erosivity values are presented as calculated in the test set versus the values estimated from the 84 fitted Yu and Rosewell equations for each station and the single neural network. In these two diagrams, the neural network’s estimations are closer to the calculated ones than those coming from the empirical equations.

    Figure 7.

    Calculated annual erosivity values coming from pluviograph data versus predicted values using the Yu and Rosewell model on the testing set. With red line, the identity function f(x) = x is symbolized.

    Figure 8.

    Calculated annual erosivity values coming from pluviograph data versus predicted values using the neural network on the testing set. With red line, the identity function f(x) = x is symbolized.

    5. Conclusions

    The universal soil loss equation with its revisions represents the established model and the means for estimating annual long-term average soil loss. An important factor of this equation is the rainfall erosivity, R, which is closely connected to the energy carried by rainfall. Rainfall data are essential for the estimation of R. In particular, data from pluviographs are more suitable for that task. However, such data are scarce, and various empirical methods have been presented for estimating R from coarser data. These methods are reviewed in this chapter.

    Also, various difficulties are discussed that are associated with the issues of missing values and the separation of erosive events. In addition, indications are given of the adverse nature of the relation of erosivity to precipitation characteristics. All these facts raise objections to the suitability of empirical models and lead to the use of data-driven methods that will be able to handle more effectively the problematic nature of the data and also to discover more reliably the hidden relations between erosivity, precipitation, seasonality, and local characteristics. This is achieved, because in machine learning procedures, no human intervention or bias is involved in the selection or in the forms of these hidden nonlinearities.

    More specifically, neural networks, as machine learning tools, are employed in this chapter for the estimation of erosivity, in comparison to two different empirical equations of the literature. The results demonstrate, first of all, the superior performance of the machine learning method and its ability for generalization, so as to perform equally well upon new data. Another advantage of the proposed learning algorithm is that it produces a single model accessing all available data, in contrast to empirical equations, where a researcher must fit and handle as many nonlinear equations as the number of available stations or the product of the number of the stations by the number of the used seasons, making the analysis cumbersome and prone to errors.

    The latter fact is exemplified on the country-wide data from Greece, used in this chapter. The single neural network, which was produced, outperformed 84 different empirical nonlinear equations, each one of which was fitted using data from each station. The comparison of the performance was made using test data sets to which neither the empirical equations nor the neural network had any access during their training.

    In conclusion, the present chapter gives an indicative and characteristic sample of the use of machine learning methods in the problems associated with erosivity. There are still many related problems, such as the spatial and temporal distribution of erosivity, with significant potential for the application of machine learning methods. The same is true of many areas of hydrology and water resources.

    Conflict of interest

    The authors declare no conflict of interest.

    Download

    chapter PDF

    © 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Konstantinos Vantas, Epaminondas Sidiropoulos and Chris Evangelides (June 28th 2019). Rainfall Erosivity and Its Estimation: Conventional and Machine Learning Methods [Online First], IntechOpen, DOI: 10.5772/intechopen.85937. Available from:

    chapter statistics

    76total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More about us