Partial auto-correlation function (PACF) between rainfall and runoff data.

## Abstract

The forecasting plays key role for the water resources planning. Most suitable technique is Artificial intelligence techniques (AITs) for different parameters of weather forecasting and generated runoff. The study compared AITs (RBF-SVM and M5 model tree) to understand the rainfall runoff process in Jhelum River Basin, Pakistan. The rainfall and runoff of Jhelum river used from 1981 to 2012. The Different rainfall and runoff dataset combinations were used to train and test AITs. The data record for the period 1981–2001 used for training and then testing. After training and testing, modeled runoff and observed data was evaluated using R2, NRMSE, COE and MSE. During the training, the dataset C2 and C3 were found to be 0.71 for both datasets using M5 model. Similar results were found for dataset of C3 using RBF-SVM. Over all, C3 and C7 were performed best among all the dataset. The M5 model tree was performed better than other applied techniques. GEP has also exhibited good results to understand rainfall runoff process. The RBF-SVM performed less accurate as compare to other applied techniques. Flow duration curve (FDCs) were used to compare the modeled and observed dataset of Jhelum River basin. For High flow and medium high flows, GEP exhibited well. M5 model tree displayed the better results for medium low and low percentile flows. RBF-SVM exhibited better for low percentile flows. GEP were found the accurate and highly efficient DDM among the AITs applied techniques. This study will help understand the complex rainfall runoff process, which is stochastic process. Weather forecasting play key role in water resources management and planning.

### Keywords

- Forecasting
- Jhelum River
- GEP
- flow duration curve
- RBF-SVM

## 1. Introduction

A long scientific challenge is weather forecasting. Accurate weather forecasting has a direct social and economic impact on the community [1]. Recently, Artificial Neural Networks are using for weather forecasting. The crucial parameter for weather forecasting is rainfall, which also generates runoff in watersheds area. This process is one of the fundamental factors in weather forecasting. The different approaches exist from physically, conceptual, modeling and artificial intelligence techniques (AITs) [2].

The rainfall-runoff process plays a vital role in sustainable water resources management. Pakistan economy depends on Agriculture. Water resources are crucial for agriculture, and most of the population livelihood depends on agriculture. Water storage is necessary, and the urban population’s rapid growth [3, 4]. The efficient and precise modeling of the rainfall-runoff process is crucial in planning water resources management [5]. Urban water management, runoff forecasting, weather forecasting and irrigation system is become the current challenge due to the uncertainty of weather forecasting. Rainfall and geographical characteristics have importance to forecasting accurately rainfall-runoff process. Rainfall-runoff considers the diverse process and AIT used to transform rainfall into runoff [6]. Similar, the transformation of precipitation into runoff investigated in the science of hydrology by different researchers [7, 8], and runoff is a complex process [9]. During the forecasting mechanism of runoff, it becomes an essential issue in hydrology and water resources management.

Rainfall and other metrological parameter play a crucial role during the forecasting of weather, which is essential for runoff generation [10]. The rainfall-runoff process is non-linear. Simple AITs cannot model this non-linear process due to several hydrological variables such as evaporation, infiltration, rainfall intensity, watershed characteristics, and surface and groundwater interaction. During the last few decades, Artificial Neural Network (ANN), genetic programming (GP), Support vector machines (SVMs), Decision Trees (DTs), and adoptive Neuro-Fuzzy Inferences System (ANFIS) are considered most efficient in hydrology and water resources. Several researchers applied AITs to forecast rainfall-runoff [11, 12, 13, 14, 15, 16]. American Society of Civil Engineering task committee applied ANNs in hydrology [17, 18]. ANNs and various algorithms were applied in a different region of the world [6, 19, 20, 21, 22].

Many studies revealed that ANNs have some limitations and drawbacks in order to predict streamflow. These include stopping criteria, over fitting issue, low learning speed, back propagation problem, and some human intervention like learning epochs and learning rate [23]. Thus, there is a need to develop some approaches to overcome these problems and generate better results as compared with ANNs.

After 2000, Support vector machines SVMs, a new kernel-based approach, become famous and got advantages over ANN. In this study, SVM and DTs were used for rainfall-runoff modeling. Firstly, SVM was first developed after inspired by statistical machine learning theories (SMLTs) for complex problems like classification and regression [24, 25] emphasized the obstacles in rainfall-runoff prediction to recognize the best model and its relevant parameters. The modified form of SVM is the least square support vector machine (LS-SVM) which decrease the computational problem [26, 27]. In many types of research, SVM is used for different forecasting scenarios [28, 29, 30]. In this regard, several researchers applied the SVM. [31] publicized that in rainfall-runoff forecasting using past daily dataset using SVM and ANNs. The SVM found most efficient technique than ANN. [32] used the SVM technique using monthly time scale data for statistical downscaling of rainfall intensity. SVM model was successfully engaged and predicted daily rainfall [33]. Another DDM is [34] M5 model tree, and M5 model tree is DDM technique which uses divide and conquers method to split the dataset into subsets, which enable the system to distribute the multi-dimensional variables and automatically build a model on the inclusive quality benchmarks [34, 35] used SVM with RBF kernel function and polynomial functions to model the suspended sediment load of a basin Iran, which exposed that SVM with RBF function gives the most accurate modeling. In recent years, different hydrological components predicted by many researchers using M5 model tree such as; sedimentation transportation and estimation [36], rainfall-runoff prediction [37], prediction of flood events [38], monthly pan evaporation prediction [39], Modeling oblique load-carrying capacity [40] and Modeling algal a typical proliferation [41].

As mentioned above several ATIs were engaged for rainfall-runoff process forecasting but still there are some techniques which have not yet been evaluated such as RBF-SVM and the model tree M5. Himalayan rivers especially Jhelum River basin initiating primarily from >4000 masl, withstand tremendous amount of inhabitants downstream. Though, Jhelum River basin is very data limited, and hydrological data for hydro-meteorological factors is accessible mainly from the areas below 2000 masl. Since the high level of anthropological need on these rivers, it is essential to progress strategies and tactics based on the hydrology of these rivers [42, 43, 44, 45]. Therefore, these AITs will be very necessary for forecasting of hydrological parameters especially rainfall-runoff processes. These AITs are actually need of this region where data management and acquiring of hydrological data is adamant.

Keeping the previous studies on modeling of rainfall-runoff processes in mind, this study was arranged in such a way for different employee AITs to achieve the primary objectives of this research as 1) to calibrate and validate the AITs (GEP, BRF-SVM and M5 model tree) for the modeling of the rainfall-runoff process; 2) to evaluate the best input combination for the applied AITs. To achieve these objectives, hydrological data of rainfall and runoff were employed to model this process. To evaluate models performances, some statistical evaluation parameters, i.e. determination coefficient (R2), coefficient of efficiency (COE), mean squared error (MSE), and normalized root mean square error (NRMSE), were used.

The input selection process for data-driven rainfall-runoff models is critical because input vectors determine the structure of the model and, hence, can influence model results. This chapter is arranged as follows. Section 1 “Introduction and Review literature” where all previously employed and selected methodology is discussed. Section 2, “Rainfall-Runoff forecasting”, includes study area and data acquisition, which elaborates a brief summary description of the study area and dataset comprising nine gauges and runoff on past thirty years daily rainfall data dataset and Model fitness criterion, Trend analysis tests. Section 3,” Methodology”, summaries proposed AITs (RBF-SVM and M5 model tree). Section 4, “Results and Discussions”, describes the analysis results of outputs of different applied AITs for modeling rainfall-runoff process and trend analysis of rainfall in different rainy seasons. Section 5, “Conclusion”, accomplishes the study.

## 2. Materials and methods

### 2.1 Study area

The geographical Jhelum River basin situated at 33.14°N and 73.64°E. The drainage area of the basin is 33,867 km^{2}. It originates from Pir Panjale from the North-Western Part of the great Himalayan range and gets significant contributions to the flow from its tributaries. Kunhar and Neelum River fall in Jhelum River at Muzaffarabad. Poonch and Kanshi join the Jhelum at Mangla reservoir [46]. It is the Trans Boundary River between Pakistan and India. 56% of the area of the rivers occurred in India [47]. Jhelum River basin lays 25% under maximum snow accumulation. The dataset for the basin was collected from the Surface Water Hydrology Project (SWHP) from 1981 to 2012. It is mainly affected by monsoon rainfall. During the summer season, rain shadow of the Himalayas range makes Eastern Himalayan chronicles [48, 49, 50]. The rainfall station and flow station are shown in Figure 1. Western disturbance starts from December, and the moon soon starts from June to September in every year [51]. The rainfall decreases from the northern part to the eastern region. The annual rainfall found to vary from 70—135% [52, 53]. Astor station also considered as previous researchers used for analysis [54].

### 2.2 Auto-correlation function (ACF) and partial auto-correlation function (PACF)

For the selection of proper input combinations of rainfall and runoff, the autocorrelation function (ACF) [60] and Cross-correlation function (CCF) [61] were employed for runoff data and rainfall-runoff data, respectively, with a 95% confidence level. From the Tables 1 and 2, it can be seen that the cross-relation in the rainfall and runoff dataset is poor, which may be an issue for modeling of rainfall-runoff phenomenon [62]. So, the partial autocorrelation was used between these two input variables. It is concluded from the results shown in Table 1 that three lag times of rainfall and runoff datasets will be efficient for the modeling process. Based on results, the following input combinations were used in this study;

C1 Q(t-1)

C2 Q(t-2)

C3 Q(t-3)

C4 Q(t-2), Q(t-1), P(t-1), Pt

C5 Q(t-2), Q(t-1), P(t-2), P(t-1), Pt

C6 Q(t-3), Q(t-2), Q(t-1), Pt

C7 Q(t-3), Q(t-2), Q(t-1), P(t-3), P(t-2), P(t-1), Pt

Partial Auto-Correlation Function (PACF) | ||||||||
---|---|---|---|---|---|---|---|---|

Input Variable | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% |

_{obs} | 89224.45 | 30235.64 | 2.95 | 0.00 | 29953.86 | 148495.05 | 29953.86 | 148495.05 |

_{(t)} | −1449.13 | 3534.18 | −0.41 | 0.68 | −8377.14 | 5478.88 | −8377.14 | 5478.88 |

_{(t-1)} | −840.46 | 3838.46 | −0.22 | 0.83 | −8364.94 | 6684.02 | −8364.94 | 6684.02 |

_{(t-2)} | −239.51 | 3838.46 | −0.06 | 0.95 | −7764.01 | 7284.99 | −7764.01 | 7284.99 |

_{(t-3)} | −1139.05 | 3534.17 | −0.32 | 0.75 | −8067.04 | 5788.94 | −8067.04 | 5788.94 |

Auto-correlation Function of Runoff Data | |||
---|---|---|---|

Input Combinations | Training Data | Testing Data | Whole Data |

Q_{obs}, Q_{t-1} | 0.6785 | 0.9148 | 0.6785 |

Q_{obs}, Q_{t-2} | 0.3194 | 0.8721 | 0.3194 |

Q_{obs}, Q_{t-3} | 0.0008 | 0.8529 | 0.0008 |

Where Q is discharge (m^{3}/sec), P is precipitation (mm), and it is Time (sec). There we created different time lags of Q and P to test and train the models, i.e. (t), (t-1), (t-2) and (t-3). These parameters are arranged to create different input combinations C1, C2, C3, C4, C5, C6 and C7, which are used for testing and training AI techniques to get better results.

### 2.3 Support vector machine (SVM)

A brief description of the SVM has been mentioned in this study, whereas the theory SVM [24] was discussed by many researchers in detailed, i.e. [28, 29, 30]. According to [24] in the SVM technique, independent variable x helps estimate the dependent variable y. The relationship between x and y was determined by the given function like other regression scenarios;

where Ø is kernel function which can be defined as; it takes to input information and changes it into the desired shape. Various SVM algorithms practice diverse sorts of kernel functions. There are many kinds of these functions. i.e. sigmoid, polynomial, non-linear, linear, and RBF. b is a constant, w is the coefficient of vector, w and b are the constraints of the regression function. In contrast, noise is elaborated by error tolerance (e). During the training of the SVM model, a process of association of successive optimization of the error function in which can be achieved. There are two kinds of SVM models based on the error function, such as e-SVM (Regression I) and t-SVM (Regression II) [63]. In this study, BRF Regression, I is engaged because for prediction like rainfall-runoff purposes. [64, 65] proposed that the training time of SVM decreased by selecting the automatic RBF kernel function because it efficiently selected the proper kernel function constraints. As compared to V-fold validation is consumed less Time and more efficient. Let consider (x^{i}_{j}) j = 1……. Ni Rd. is the dataset of i, and Ni is the number of training samples of i class. Whereas i = 1,2,3…...L and L is the number of classes in the dataset, then RBF is;

K is a kernel function, (x‵, σ) are elements of R^{d} and σ element of R-0 which is corresponding constraints. It has two major possessions, i. the cosine value of training dataset ≥1, and it must be more than 0. ii. The norm in the dataset must be 1 [66] shown in Figure 2.

As in this study, RBF based kernel is used, so the following expression is used to calculate the mean of values;

Therefore, b(σ) is calculated in a pattern that (σ) is must be greater than 0 but not less or 0. The σ can be calculated in SVM based on RBF kernel function by solving the given steps;

To determine the best constraint, the given expression is optimized.

Applying the RBF kernel function further utilizes the V-fold cross validation to determine the best constraint (penalty constraint).

Based on [66] theory of RBF kernel-based SVM, the technique is employed in this research for rainfall-runoff modeling.

### 2.4 M5 model tree

In the M5 model tree machine learning technique, the following principle converted the space into the area and made the linear regression model. The model’s outcome is shown in the modular model, committee machine, with linear models specially designed on appropriate subsets of input space. This design is not innovative. Fusion of specialized technique (“local” model) is passed down in modeling. The finding can clear analogy among Model Trees (MTs), and a combo of linear models utilized in dynamic hydrology since the 1970s- evident paper on multilinear techniques is by [67]. Model tree M5, based on the information theory principle, will have divided multi-dimensional space and create the models automatically based on quality criterion. The number of models can also be varying in number. Computational intelligence techniques combined the numerous models and possibly the combination theory and data-driven outcomes are supporters in hydrology. (example [68], in the fuzzy system, combined hydrological techniques). Computational requirement for model tree raises rapidly with dimensionality [34]. Model tree tackles the task efficiently with high dimension-up to hundreds of attributes. The main advantage of tree models instead of the regression model is that they are smaller than regression trees. The strength of the decision is clear, and regression parameters do not normally involve various variables. M5 algorithm is used for inducing a model tree, which works as shown in.

Suppose collection T of example training is available. Each example is categorized by the values of non-variable set of attributes and has target value. Goal is to build a model with associated target values of training and their input attributes. The efficiency of the model will be calculated by the accuracy, which is forecasting that targets unknown cases shown in Figure 3.

### 2.5 Model performance

Different performance evaluation criteria were used to evaluate the reliability of AITs of the rainfall-runoff process [22, 55] 1) Co-efficient of determination (R^{2}) [56]; (2) Normalized root mean square error (NRMSE) [57]; (3) Nash-Sutcliffe Coefficient of efficiency (COE) [58] (4) Mean square error (MSE) [59] were used.

Where, Q_{obs} and Q_{pre} are the observed and predicted flows, respectively, while Q_{mean} is the mean of observed flows.

## 3. Results and discussions

### 3.1 Rainfall forecasting

Flow Duration Curves (FDCs) were employed to evaluate the applied AITs against the percent of Time. FDCs for all input combinations (C1, C2, C3, C4, C5, C6 and C7) showed a good relationship with applied AITs in both training and testing seasons. To understand the behavior of applied AITs with the Jhelum River basin, the FDCs analysis was executed at nine rainfall stations for the modeling of the rainfall-runoff process as the runoff data was collected from the Mangla reservoir from time duration 1981–2012, the behavior of all techniques necessary to understand throughout the catchment.

The observed hydrographs of low, medium and high percentile flow extracted by the AITs (GEP, RBF-SVM and M5 Model Tree) to access the capability. [52, 69, 70] revealed that the FDCs exposed the relationship between the observed and modeled percentile flow and exceedance probability in the designated time duration. From 1 to 10%, the flow is considered high, 11–89% the flow is medium while, 90–100% the flow is referred to as low flows, which can be clearly seen from.

Furthermore, the percentile flows from 11 to 49%, and 50–89% are considered high medium and low medium flows. The outcomes of FDCs exposed that the GEP was better AIT for high flows and medium-high flows, and it better bonds with FDC of observed flow. Whereas the FDC of the M5 Model Tree better bonds with medium-low and low percentile flows. While RBF-SVM better bonded with the FDCs of low percentile flows. GEP was compared to other AITs was found more accurate DDM and found highly efficient. ** RBF-SVM**these trends are shown in Figure 4.

In RBF kernel-based SVM modeling, the functionality and importance of input combinations were achieved by adjusting the model parameters Gamma, C and P. In other words, the successful application of the RBF-SVM model dependent on accurate determinations of these model parameters. Figure 5 and Table 3 show the output results of different input combinations regarding model evaluation performance criterion. It can be clearly seen that RBF-SVM has potential and explicit good performances in training and testing durations of rainfall-runoff modeling. Furthermore, all input combinations employed in this research showed good performance. R^{2}, COE, MSE and NRMSE for the training period were found 0.99, 1.00, 21245.92 and 820420.17m^{3}/sec with input C3 and 0.99, 1.00, 21475.00 and 825413.21 m^{3}/sec respectively with input C6. But input combinations C2 and C4 were found poor combination during training of model with results 0.16, 1.00, −16623.59, 833046.88 m^{3}/sec and 0.11, 1.00, 980.10, 988371.24 m^{3}/sec respectively. The behavior of RBF-SVM found poor in both cases due to which showed deprived results. By examining the model evaluation parameters in testing periods, it can be seen that the RBF-SVM model performed and obtain better prediction accuracy. R^{2}, COE, MSE and NRMSE for the testing duration were found 1.00, 1.00, 188.52 and 1437.96 m^{3}/sec with C1 and 1.00, 1.00, 147.81 and 1128.49 m^{3}/sec with input C5, respectively.

^{2} | 0.97 | 0.16 | 0.16 | 0.11 | 0.99 | 0.98 | ||

1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||

10366 | −8311 | −1703 | 490 | 10737 | 10409 | |||

401115 | 416523 | 452420 | 494185 | 412706 | 414639 | |||

^{2} | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||

1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||

94 | 131 | 154 | 121 | 137 | 96 | |||

718 | 654 | 691 | 588 | 613 | 555 |

### 3.2 M5 model tree

The outcomes of training and testing of the M5 Model Tree for the rainfall-runoff process confirms the fact that it has the potential of identifying the relationship between both hydrological variables of a catchment. This statement was confirmed by the model evaluation criteria with low values of NRMSE and high values of R^{2} and COE for the validation and testing of the dataset, which suggests the best model fit. The visualization of Table 4 shows that the M5 Model Tree has the capability to reproduced well by the model with different rainfall-runoff input combinations. The training results indicate that the prediction of Q(t-2) and Q(t-3) quite well for the rainfall-runoff process having results of R^{2}, COE, MSE and NRMSE, 0.71, 1.00, 0.00, 757158.18 m^{3}/sec and 0.71, 1.00, 0.00, 757158.18 m^{3}/sec respectively. During testing of the model, the model evaluation parameters R^{2}, COE, MSE and NRMSE results are found as 1.00, 1.00, 0.00, 887.52 m^{3}/sec with input C7, which means that the M5 model tree explicit good results in testing with both rainfall and runoff combinations. The modeling error for the verification of the results indicates high values of R^{2} and COE and low values of NRMSE, demonstrating the good M5 model tree performance.

^{2} | 0.70 | 0.65 | 0.65 | 0.65 | 0.65 | |||

1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||||

0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||||

378605 | 378596 | 378596 | 378570 | 378570 | ||||

^{2} | 1.00 | 1.00 | 0.99 | 1.00 | 0.99 | 0.99 | ||

1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||

0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |||

719 | 662 | 683 | 555 | 612 | 698 |

## 4. Conclusion

The study compared AITs (RBF-SVM and M5 model tree) to understand the rainfall-runoff process in the Jhelum River Basin. Different rainfall and runoff dataset combinations were used to train and test AITs. After training and testing, modeled runoff and observed data was evaluated using R^{2}, NRMSE, COE and MSE. The conclusion of this study as following:

Different datasets were analyzed to achieve the target, such as C1, C2, C3, C4, C5, C6 and C7 with lagged past daily rainfall and runoff. Overall, C3 and C7 were performed best among all the dataset. These two datasets showed efficient and accurate results in the training and testing phases.

The M5 model tree was performed better than other applied techniques. GEP has also exhibited good results to understand the rainfall runoff process. The RBF-SVM performed less accurate as compared to other applied techniques.

Flow duration curve (FDCs) were used to compare the modeled and observed dataset of the Jhelum River basin. For High flow and medium-high flows, GEP exhibited well. M5 model tree displayed better results for medium-low and low percentile flows. RBF-SVM exhibited better for low percentile flows. GEP was found the accurate and highly efficient DDM among the AITs applied techniques.

This study will help understand the complex rainfall-runoff process, which is a stochastic process. Streamflow, weather forecasting plays a key role in water resources management and planning.

## Acknowledgments

The dataset of metrological provided by Pakistan metrological department as well as stream flow data provided by WAPDA.