Rainfall Erosivity: Gap-Filling Method Differences in the Brazilian Pantanal and Cerrado Biomes Rainfall Erosivity: Gap-Filling Method Differences in the Brazilian Pantanal and Cerrado Biomes

To improve the use of soil and its conservation, precipitation data are necessary. With the Universal Soil Loss Equation (USLE), the study of historical precipitation series is a main factor, but in these series, there are gaps that need to be filled. This study had, as a basis, the methods of weighted likelihood, multiple regression, and weighted likelihood based on multiple regression to fill the gaps of the rainfall data for the rainfall gauges in the Brazilian biomes (Cerrado and Pantanal, municipalities of Campo Grande, Bandeirantes, Sidrolândia, Miranda, Fazenda Ponte, and Ribas do Rio Pardo). With this, it became pos- sible to calculate the rainfall erosivity (R factor in the USLE). Therefore, the consistency of the filled rainfall data was analyzed by the double mass method. The value of the rain - fall erosivity calculated varies from 2304.80 to 13562.10 MJ mm ha −1 h −1 year. With this data, it was possible to identify particular results that differed from the rainfall erosivity. Comparing all the gap-filling methods, numbers varying from 0–12% at the same rainfall gauge were obtained.


Introduction
Climate changes are each day more and more notable throughout the world, and based on this fact, scientific studies are being developed, having as one of the main subjects studied being rainfall and its historical series [1]. In studying rainfall and its historical series, gaps in its data (these gaps can occur due to equipment failure or data observer's mistakes, which are the most common reasons) were found. These gaps can occur in hourly, monthly, or annually collected data. In some situations, these gaps make it impossible to use the data in some studies [2].
With the need of filling these gaps, some methods were developed and are often used in studies. These methods include the artificial neural network (ANN) method, as it can be seen in [3][4][5], weighted likelihood method [6,7], multiple regression method [8,9], and weighted likelihood based on multiple regression method [1,2].
Consequently, with the development of these methods, it was necessary to create a method to analyze the data consistency when its gaps were filled. According to this, [10] developed a method called double mass.
A continuous historical rainfall series, with filled gaps and analyzed consistency, can be applied in many studies such as urban drainage, soil conservation, and water conservation. In the soil conservation field, many studies have been developed about soil loss due to water erosion [11][12][13], which is described by the equation proposed by [14], which considers variables like the soil erodibility, topographic factor, soil use and management, conservation practices, and rainfall erosivity.
The rainfall erosivity (Rc) is calculated based on historical rainfall series, and to obtain these continuous historical series, certain methods are used, where the resultant data can be different depending on the used method. Based on this, this study was developed aiming to analyze the differences obtained in the rainfall erosivity results calculated with filled rainfall data using the methods-weighted likelihood, multiple regression, and weighted likelihood based on multiple regression-and to obtain a better correlation coefficient between different hydrological data sources (radar, satellite, and local).

Area of study
In order to fill the gaps in historical series, an auxiliary rainfall gauge is used. For each gap in the series, it is advised to use at least three other values from an auxiliary rainfall gauge [2]. Therefore, six rainfall gauges (Figure 1) were used: Campo Grande, Bandeirantes, Sidrolândia, Miranda, Fazenda Ponte, and Ribas do Rio Pardo. These stations are located in the Paraná River Basin, in the central area of the state of Mato Grosso do Sul, Brazil. With the stations defined, values of historical series from the Agência Nacional de Águas (ANA) and Instituto Nacional de Meteorologia (INMET) data base were collected. The range with the fewest gaps was from January 1, 2001, to December 31, 2014, and for filling the gaps, three methods were used.

Weighted likelihood method
As it can be seen in [2], the month without data is filled with Eq. (1): where: P x = data to be filled (mm), n = number of auxiliary stations, N x = annual average rainfall at the station without data (mm), N i = annual average rainfall at the auxiliary station (mm), P i = rainfall at the auxiliary station in the month to be filled (mm).

Multiple regression method
As pointed out in [15], the multiple regression method is based on applying multiple regression, establishing a relation among the auxiliary stations and the station with the data gap, and it uses Eq. (2): y c = x 1i + a 1 x 2i +⋯+ a n−1 x ni + a n (2) where: y c = data to be filled (mm). n = number of auxiliary stations. a n = coefficient to be estimated by multiple regression.

Weighted likelihood based on multiple regression (mixed)
As it can be seen in [16], weighted likelihood based on multiple regression is based on mixing both of the previous methods. To make it easier to understand the name of this method, it will be denoted as mixed. First a multiple regression between the gap station and each auxiliary station is calculated separately, and then the weight of each station related to the gap station is calculated, using Eq. (3): where: W yxj = weight factor between the station (y, gap station, and xj auxiliary station), r yxj = correlation coefficient between the gap station and auxiliary station (linear regression coefficient), n = number of auxiliary stations.
For each auxiliary station, a value of W yxj was obtained. With all the weight factors calculated, their sum must be equal to 1. After this is calculated, the data for the gap is calculated with Eq. (4): where: y c = station data to be filled (mm), x n = rainfall data at the auxiliary station in the month of the data to be filled (mm), W xn = weight factor between gap station and auxiliary station.

Double mass method
With the data gaps filled by all three methods, it was necessary to analyze the data consistency.
To achieve this, the double mass method was used as it can be seen in [17][18][19], described by [10], which consists of comparing two rainfall gauges using the amount of rainfall during the period of the study, developing a chart using a reliable station on the x-axis and the station to be compared on the y-axis. The chart developed tends to be a straight line; the inclination of the straight line represents the correlation between the reliable station and the station to be analyzed.
In some cases, the dots of the chart may not tend to be a straight line, or it may appear as ranges in the chart with different inclinations from the straight line found in the linear regression. These differences may be because of the failure in obtaining the data, changing the observer at the station, changing the environment nearby the station, or changing the location of the monitoring equipment [10].
To analyze the consistency, the station in Sidrolândia was determined as reliable and, therefore, used as a reference, since during the period of study, there was only one value gap.

Rainfall erosivity
To obtain the value of rainfall erosivity, Eq. (5) was used described by [20] R c = P ∧ 2 / P where: p 2 = monthly rainfall (mm), P = annual average rainfall (mm) according to [21] who has defined these parameters in Eq. (5) to be applied in Campo Grande, as it can be seen in [11]. These parameters are shown in Eq. (6): where: R = rainfall erosivity factor (MJ mm ha −1 h −1 year), p = monthly rainfall, P = annual average rainfall.
With this equation the rainfall erosivity for each method and for each rainfall gauge was calculated. After that, the rainfall erosivity for each station was classified for each method used.
The following chart was used to classify the rainfall erosivity level ( Table 1). Source: [22], modified to I.S. metric of unity according to [23]. With the multiple regression method, the dispersion of the dots tended to be a straight-line regression with R 2 values above 0.9938.

Gap filling
And with the weighted likelihood based on multiple regression the dispersion of the dots tended to be a straight-line regression with R 2 values above 0.9938.
Even if the dispersion of the dots in all methods used had a satisfactory R 2 value, the double mass method shows consistency in all stations but for the Fazenda Ponte station, because the dispersion of the dots at this station for all methods showed a different inclination along the straight-line regression.
These differences are explained by [10]; these differences can happen because of many factors like changing the monitoring equipment operator, changing the environment nearby the station, or changing the location of the monitoring equipment.

Calculating the rainfall erosivity
With the results filled and the continuous historical rainfall series, the rainfall erosivity was calculated for each station and for each method. The values obtained can be seen in Table 2, for Campo Grande.
The period with higher rainfall erosivity at the Campo Grande station was in 2013 according to the weighted likelihood method and in 2011 for the other two methods. The year with the lowest rainfall erosivity was in 2002 and was equal for all methods. The consistency analysis can be seen in Figure 2.
It can be noticed that for all methods, the dispersion of the dots tends to be a straight-line regression; this implies that the inclination of all methods tends to be a straight line. The rainfall erosivity in Bandeirantes can be seen in Table 3.
For the Bandeirantes station, the years of maximum and minimum rainfall erosivity were the same for all methods, in 2011 and 2002, respectively. Furthermore, the amount of the annual rainfall erosivity differs 5% (between weighted likelihood and weighted likelihood based on multiple regression). The consistency analysis can be seen in Figure 3.
For the double mass comparison for all three methods, the same dispersion pattern of the dots can be noticed. They tend to be a straight-line regression, showing a consistency in the filled gaps. The rainfall erosivity in Sidrolândia can be seen in Table 4.
The year 2002 had the lowest rainfall erosivity for all methods, and 2003 had the highest rainfall erosivity, also for all methods. The rainfall erosivity in Miranda can be seen in Table 5.     The data consistency of the Miranda station can be analyzed through its dispersion of dots compared to a straight-line regression, showing the same inclination. The rainfall erosivity at the Fazenda Ponte station can be seen in Table 6.
The period with the highest rainfall erosivity at the Fazenda Ponte station was in 2007 (weighted likelihood and multiple regression) and in 2003 (weighted likelihood based on multiple regression), and the period with lowest rainfall erosion was in 2008 (weighted likelihood and multiple regression) and in 2009 (weighted likelihood based on multiple regression). For this method the largest percentage difference was 12%. The double mass analysis can be seen in Figure 5.
Analyzing the dispersion of the data dots from this station, it is noticed that in some parts, the dispersion of the dots has different inclinations along the chart, even if the statistic approach is satisfactory (R 2 = 0.9936). These differences along the straight-line regression show us inconsistencies described by [10]. These different inclinations can be explained by facts such as changes of monitoring station operators, environmental changes nearby the location of the station, and changing of the location of the monitoring equipment. The rainfall erosivity in Ribas do Rio Pardo can be seen in Table 7.     The period with the highest rainfall erosivity at the Ribas do Rio Pardo station was in 2013 and with the lowest rainfall erosivity was in 2008. This rainfall gauge had the lowest percentage difference among all the stations (1%). The double mass analysis can be seen in Figure 6.
The data of the Ribas do Rio Pardo station is noticed to be consistent; the dispersion of the dots tends to be a straight-line regression, with no change in inclination along the line.
The values of erosivity that were found in the study are in agreement with [24].
However, in the author's study, erosivity is classified only as high, and the gap filling can range from moderate to very high. This can be explained by the fact that the author [24] did not mention gap filling and used fewer stations in the study area (Cerrado and Pantanal biome).

Rainfall erosivity classification
After filling the gaps and analyzing the consistency of all rainfall gauges, the rainfall erosivity was classified according to Table 1. The rainfall erosivity in all stations was filled by the weighted likelihood method, which can be seen in Table 8.
The classification of rainfall erosivity using the multiple regression method to fill gaps can be seen in Table 9. The weighted likelihood based on multiple regression method and its classification results can be seen in Table 10.
After the classification was finished for each method, it can be noticed that the Bandeirantes and Fazenda Ponte stations had different rainfall erosivity classifications between the methods. Bandeirantes had a rainfall erosivity classification of very high erosivity (weighted likelihood mean based on multiple regression) and of high erosivity (weighted likelihood and multiple regression). The Fazenda Ponte station had a rainfall erosivity classification of moderate to high erosivity (weighted likelihood based on multiple regression) and high erosivity (weighted likelihood and multiple regression).
The results reported here may help to identify a more adequate methodology to fill the gaps in the region. It will subsidize appropriate plans and projects to infer the best land-use strategies to improve water and soil conservation and quality and promote agriculture sustainability. Mainly, this is important because the Cerrado is one of the Brazilian biomes that has been subjected to the highest agronomic pressure, according to [25], with high interaction with the Pantanal biome.

Rainfall data sources combined: satellite, radar, and local
When it is necessary to study rainfall, a couple of variables need to be determined, for example, the size of the area, duration of the historical rainfall series, and data source. Nowadays there are some kinds of sources available: satellite, radar, and local. How to choose among them? A very important detail when choosing the data source is to identify the size of the study area. When the study area is a state, a country, or a continent, the satellite data achieves a better accuracy. On the other hand, if the study area is a state or an area with a couple of cities, radar data is more advised. Finally, if the study area is a city, a small watersheds, or a couple of cities, the local data (rainfall gauges) provides a better result.
Satellite data is possible to be obtained using Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) [26], Tropical Rainfall Measuring Mission (TRMM) [27], and Gravity Recovery and Climate Experiment (GRACE) [28]. An important characteristic is the rainfall amount; some studies show that the same satellite data can provide different accuracy depending on the rainfall amount. Some characteristics are important to observe; in developing countries the number of local rainfall gauges is low, and some studies need to use two data sources combined.
Satellite data are widely used to calculate hydrological parameters for areas that are sparsely equipped with rain gauges; thus it is possible to obtain data for a large area. On the other hand, its accuracy for high rainfall quantities is low, for example, [26] describes a correlation coefficient of 0.62 between PERSIANN-CDR and data from local gauges, in a heavy rainfall event.
Consequently, different studies for each data set are required to obtain a better combination according to each study. This paper provides an approach to obtain more accurate data through different gap-filling methods. In this way, more studies are needed to sensor gap filling, providing future studies with bettered methods and combinations.

Conclusions and recommendations
1. The value of rainfall erosivity calculated varies from 2304.80 to 13562.10 MJ mm ha −1 h −1 year. It was possible to identify variations in rainfall erosivity classification, comparing all the gapfilling methods; numbers varying from 0-12% at the same rainfall gauge were obtained.

2.
In the double mass analysis, even if statistic approaches are satisfactory and tend to be a straight-line regression, the inclination along the straight-line regression should be considered.

3.
The consistency analysis can explain the different results obtained. The Fazenda Ponte station was an example where a break in the slope was found and the results obtained diverged 12%.

4.
The weighted likelihood mean and multiple regression methods had similar performances in filling gaps; the rainfall erosivity values had a 2% maximum difference.

5.
The weighted likelihood based on multiple regression was not often found in scientific articles even if it is adopted in books.

6.
For future studies the use of the weighted likelihood based on multiple regression gapfilling method combined with a satellite data source is recommended.