Statistical summary of hydro-meteorological variables at Vanderkloof dam

## 1. Introduction

Concerns about climate change have prompted calls for action at every level of government and across many sectors of economy and society. It is therefore pertinent to establish a suite of coordinated activities that will examine the serious and sweeping issues associated with global climate change, including the science and technological challenges involved, and provide advice on actions and strategies nations can take to respond to it [1]. Therefore, a proper and good understanding of what climate is and disruptions that variation in climate (climate change) may cause, as we consider its impact on social and economic stability is of paramount importance. Global warming is no longer a speculation. The threat is real and has far reaching consequences. It is absolutely necessary therefore, to sensitize peoples of all nations about the imminent danger posed by global warming and depletion of fresh water resources.

According to [2], water scarcity has emerged as a global issue in recent times and South Africa, currently categorized as water stressed country is forecasted to experience physical water scarcity by the year 2025 with an annual freshwater availability of less than 1000 m^{3} per capita. The main cause of the scarcity is growing extensive water demand and availability of limited water resources. The situation is further being aggravated by population growth, economic development, urbanization and in more recent times by anthropogenic climate change ([3]; [4]).

Global climate change is a sensitive subject which affects the environment, ecology and quality of life on the earth. Global variations in climate have brought about extreme events like flood and drought which have had drastic impacts on river basin development structures. Such structures include dams on which nations in Africa have depended for most of its renewable energy generation [5].

[6] defined global warming as an average increase in the earth's temperature which in turn causes changes in climate and reported that global warming enhances the water cycle by intensifying the cycle of water. It is speculated that as a result of global warming, more cloud will form and there will be more rain and snow especially in areas close to water whereas in areas particularly away from water sources, excessive evaporation would dry out soil and vegetation, resulting in fewer clouds and less precipitation. Thus, the area will probably get more droughts, rivers and lakes will become shallower and the amount of groundwater decreases.

Global warming or climate variability is expected to alter the timing and magnitude in runoff and soil moisture. As a result, it has important implications for the existing hydrological balance and water resources as well as for future water resources planning and management. Quantitative estimation of the hydrological effects of climate change is therefore essential for understanding and solving potential water resource problems that may occur in the future ([7]; [8]). In the past, decisions relating to the management of extreme climatic conditions, especially as they affect developments within river basins, have either been experimental or experiential [9]. Such subjective approaches have not been able to provide quantitative measure for predicting future climate change impact. This study aimed at using statistical and mathematical modelling approach to provide quantitative measures of past, present and future climate change impacts in the Vanderkloof river basin, South Africa. It focused on the specific impact of global warming as related to the operation of the Vanderkloof dam in South Africa. The specific objectives of this study include studying the impact of global warming on the Vanderkloof River catchments in order to determine its effect on the hydrology of the basin especially on the municipal water supply, hydropower and irrigation systems in the basin and suggesting ways of maintaining and improving on the existing outputs from the existing system in spite of the climate change phenomenon. The outcome of the study will help determine the impact of climate change on the Vanderkloof River basin. It will also be useful in suggesting water management options and preparing operational guides for the dam system so as to optimize the use of available water resources. This will help in making recommendations to policy makers and the authorities of the dam system to enhance the future operation of the dam.

## 2. Methodology

### 2.1. Data and statistics

Long term time series data of 14 Hydro-meteorological variables of the Vandekloof watershed was analysed using mathematical and statistical methods with the aim of developing quantitative models that can be used to forecast future climate change scenarios in the basin and evaluate the performance of the dam system. The variables used include the average annual values of minimum temperature, maximum temperature, average temperature, wind speed, watershed precipitation, dam surface precipitation, dam surface evaporation, reservoir inflow, reservoir outflow, reservoir elevation, reservoir storage, turbine release, irrigation water release and municipal water supply. Range of minimum temperature, maximum temperature, average temperature, wind speed and watershed precipitation spans from 1977 to 2011 (35 years), dam surface precipitation, dam surface evaporation, reservoir inflow, reservoir outflow, reservoir elevation, reservoir storage and turbine release ranged from 1977 to 2008 (32 years), while the span of data for irrigation water release and municipal water supply is from 1990 to 2008 (19 years). Sample statistics such as mean, standard deviation, variance, skewness coefficient and kurtosis were computed to provide an insight into the characteristic (e.g centre and dispersion) of the respective population parameters. Normality of the data was examined using skewness coefficient and kurtosis.

### 2.2. Estimation of autocorrelation coefficient and pre‐whitening of data series

Autocorrelation or serial correlation may cause an increase in the expected number of false-positive trends. If autocorrelation exists in the time-series data, an approach to remove this trend needs to be adopted. The approach used to detect lag *k* autocorrelation is based on the equation ([10]; [11]):

where *x*_{t} is the time-series data value at time *t* and *N* is the number of samples for constant sampling interval. The range of *r*k is 0 ≤ *r*k ≤ 1 with a value of 0 meaning that the time-series is independent, and a value of 1 meaning that autocorrelation exists [10]. Autocorrelation is tested for at the 95% significance level. The first-order autocorrelation coefficient *r*_{1} is especially important because for physical systems, dependence on past values is likely to be the strongest for the most recent past. For the one-sided test, the World Meteorological Organization recommends that the 95% significance level for *r*_{k} be computed by [11]:

where *N* is the sample size and *k* is the lag. If the computed value of autocorrelation is greater than the critical value at a significance level of 95%, then the existence of autocorrelation in the time series data is not by chance and a method for removing this may be adopted. The most common approach for removing the impact of serial correlation in time-series data is the pre-whitening method as follows [10]:

where *xp*_{t} is the pre-whitened series for time interval *t*, *x*_{t} is the original variable *x* for time interval *t*, and *r*_{k} is the estimated serial correlation coefficient at lag k. In this study, equations 1 to 3 were adopted to detect the presences of serial correlation in the time series of the hydro-meteorological variable.

### 2.3. Statistical trend analysis

Trend analysis is a statistical method widely implemented to analyze hydrological time series of temperature, streamflow, precipitation and other climatic variables [10]; [12]; [8]; [13]. The statistical trend analysis in this study was carried out in three phases. First the nonparametric Mann-Kendall test was applied to detect the presence of monotonic increasing or decreasing trend in the time series of the hydro-meteorological random variables. Second, the slope of a linear trend in the data series was estimated using two methods of linear regression analysis. Last, the Pearson Product Moment Correlation Coefficient of the variables and time were computed to determine the strength of the linear relationship between the variables and time.

*2.3.1. Trend detection*

The time series of some hydrological random variables often exhibit significant trends over time. Trend detection in hydro-meteorological time series is of practical importance in analyzing the impacts of global warming and climate change in the various ecosystems of the earth. Statistical procedures may be adopted for the detection of the gradual trends in hydrological series over time. The purpose of trend testing is to determine if the values of a random variable is generally increasing (or decreasing) over some period of time in statistical terms [14]. The Mann-Kendall test is a non-parametric test which is commonly adopted to detect monotonic trends in hydrologic data analysis. This test does not require the assumption of normality of the random variable and only indicates the direction but not the magnitude of significant trends. Trends detectable by the Mann-Kendall method are not necessarily linear. Moreover, the Mann-Kendall test is less affected by the presence of outliers because its test statistic *S*, is based on the sign of differences and not directly on the values of the random variable. Due to its advantages, this test has been applied in a series of recent climate studies ([15]; [16]; [13]; [10]; [12]). The null hypothesis *H*_{o} in the Mann-Kendall test is that there is no trend and data are independent and randomly ordered. This is tested against an alternative hypothesis *H*_{1}, that there exists a trend in the time series. The Mann-Kendall test statistic *S* is estimated using the equation ([12]; [14]):

where *x*_{j} and *x*_{k} are the annual values in years *j* and *k*, *j* > *k*, respectively, and

A high positive value of the *S* statistic indicates an increasing trend, while a low negative value indicates a decreasing trend in the time series of the random variable. The evaluation of the probability associated with *S* and the sample size, *n*, is however necessary to determine the statistical significance of the trend [17]. The variance of *S* is computed as

where *q* is the number of tied groups and *t*_{p} is the number of data values in the *p*^{th} group. For a sample size of *n >* 10, the sampling distribution of *S* is known to follow a standard normal distribution *Z*. The computed values of *S* and *VAR(S)* are used to compute a *Z* test statistic as follows [14]:

The statistical significance of the *Z* values is tested for at the 95% and 99% levels of significance. The critical values of *Z* at 95% and 99% significance levels are *Z*_{0.025}*=1.96* and *Z*_{0.001}*=2.58* respectively*.* The trend is said to be decreasing if *Z* is negative and the absolute value of *Z*, computed using equation (7), is greater than the critical value, while it is increasing if *Z* is positive and greater than the critical value. If the absolute value of *Z* is less than the critical value, there is no trend and the alternative hypothesis that there is a trend is rejected [17]. The significance of a trend simply implies that the occurrence of the trend is not by a process of chance in the selection of the random sample, it has a definite cause. If a trend is significant at the 99% level of significance, then it is said to be highly significant [18].

*2.3.2. Development of regression models*

Regression analysis involves the use of mathematical and statistical techniques for modeling and analyzing several variables with the aim of developing quantitative relationships between a dependent variable and one or more independent variables. Specifically, regression analysis provides insight into how the typical value of a dependent variable changes when one of the independent variables is varied while the others are held constant. Regression analysis is widely used for prediction and forecasting. This study employs the use of two methods of linear regression to develop quantitative statistical models for climate change analysis in the Vanderkloof River Basin, South Africa. The first method is a parametric method while the second is a nonparametric method of linear regression analysis. The parametric method employs the method of least-squares deviations while the non-parametric method involved the use of the Thiel-Sen estimator of slope to develop the respective linear model equations. Parametric methods are suitable when the population can be assumed to conform to a particular probability distribution, whereas non-parametric methods are distribution free methods. Non-parametric methods are often used due to theirs advantages: simplicity, capability of handling non-normal and missing data distributions, and robustness to the effects of outliers and gross data errors ([10]; [19].

*2.3.2.1. Method of least squares*

The least-square method of linear regression requires the assumptions of normality of residuals, constant variance, and true linearity of relationship [12]. Normality implies that the population from which the sample was drawn is normally distributed. Many statistical procedures rely on population normality. The null hypothesis for a normality test states that the population is normal. The alternative hypothesis states that the population is not normal. The regression equation is obtained as [20]:

where,

a and b are evaluated using the following equations [20]:

*2.3.2.2. Thiel-Sen method of linear regression*

The Sen’s slope estimator also known as the Kendall robust line-fit method is a non-parametric method of robust linear regression that chooses the median slope among all lines through pairs of two-dimensional sample points. This method offers many advantages and competes well against simple least squares even for normally distributed data. It can be computed efficiently and is insensitive to the presence of outliers; it can be significantly more accurate than simple linear regression for skewed and heteroskedastic data. Missing values are allowed and the data need not conform to any particular distribution. Moreover, the Sen’s method is not greatly affected by single data errors ([21]; [19].

The Sen’s method can be used in analysis where the trend can be assumed to be linear i.e.

where Q is the slope, B is a constant called the intercept and *t* is time. To evaluate the slope estimate *Q* in equation (11) the slopes of all data value pairs is computed using the equation

where *j>k.* If there are *n* values *x*_{j} in the time series there will be as many as *N = n(n-1)/2* slope estimates *Q*_{i}. The Sen’s estimator of slope is the median of these *N* values of *Q*_{i}. To obtain an estimate of B in equation (11) the *n* values of differences *x*_{i} *– Qt*_{i} are calculated. The median of these values gives an estimate of the intercept, B [21].

### 2.4. Computation of correlation coefficients

Correlation coefficient measures the strength of the linear relationship between a dependent and an independent variable ([18]; [22]). In using the Pearson Product Moment Correlation Coefficient, The sample correlation (*r*), is obtained using equation (13), [20]:

*r*close to 0 indicates that there is no association between the variables. R-square (

In this study, the Microsoft Excel software was used for the computation of relevant statistics and plotting of figures. A program was also written in visual basic for applications to facilitate the computation of the Man-Kendall statistics *S*, Sen’s slope *Q*, and intercept *B*.

## 3. Results and discussion

A summary of the computed values of various parameters resulting from the statistical analysis performed in the study is presented in this section. Table 1 presents a summary of statistics for the hydro-meteorological variables analysed in this study. A summary of the autocorrelation analysis is presented in Table 2. The correlation coefficients between the climatic variables and time are presented in Table 3. Table 4 summarizes the result of the Mann-Kendal analysis while the developed Sen model equations and regression model equations are presented in Table 5. Figures 1 to 14 depicts plots showing the time trend of the variables.

Variable | Statistics | ||||

Mean | Variance | Std. Deviation | Skew | Kurtosis | |

Minimum Temperature (^{o}C) | 11.68 | 0.30 | 0.55 | 0.816 | 0.588 |

Maximum Temperature (^{o}C) | 23.16 | 0.93 | 0.96 | -1.072 | 0.477 |

Average Temperature (^{o}C) | 16.78 | 0.13 | 0.35 | -0.378 | -0.240 |

Wind Speed (km/h) | 17.99 | 1.75 | 1.32 | 0.327 | -0.860 |

Reservoir Elevation (m) | 54.16 | 23.57 | 4.86 | -1.446 | 2.187 |

Watershed Precipitation (mm) | 533.19 | 11685.65 | 108.10 | -0.073 | -0.607 |

Dam surface Precipitation (ML) | 3972.78 | 3089745.86 | 1757.77 | 0.617 | -0.190 |

Dam surface Evaporation (ML) | 18696.50 | 8125912.00 | 2850.60 | 0.078 | 1.680 |

Reservoir Inflow (ML) | 428598.34 | 6.60 x 10^{10} | 256883.91 | 1.579 | 3.281 |

Reservoir Outflow (ML) | 455289.63 | 1.03 x 10^{11} | 321278.35 | 2.066 | 4.623 |

Reservoir Storage (ML) | 2430909.53 | 2.35 x 10^{11} | 484349.18 | -1.082 | 0.859 |

Turbine Release (ML) | 342055.06 | 2.61 x 10^{10} | 161538.93 | 0.895 | -0.048 |

Irrigation Water Release (ML) | 28280.55 | 89411891 | 9455.79 | 2.173 | 7.313 |

Municipal Water Supply (ML) | 68.09 | 1075.60 | 32.80 | 2.627 | 6.550 |

Variable | Lag 1 Autocorrelation | |

Computed | Critical Value at 95% significance level | |

Minimum Temperature (^{o}C) | 0.622 | 0.255 |

Maximum Temperature (^{o}C) | 0.847 | 0.255 |

Average Temperature (^{o}C) | 0.616 | 0.255 |

Wind Speed (km/h) | 0.505 | 0.255 |

Reservoir Elevation (m) | 0.445 | 0.262 |

Watershed Precipitation (mm) | 0.156 | 0.287 |

Dam surface Precipitation (ML) | -0.043 | 0.262 |

Dam surface Evaporation (ML) | 0.431 | 0.262 |

Reservoir Inflow (ML) | 0.219 | 0.262 |

Reservoir Outflow (ML) | 0.237 | 0.262 |

Reservoir Storage (ML) | 0.442 | 0.262 |

Turbine Release (ML) | 0.272 | 0.258 |

Irrigation Water Release (ML) | 0.423 | 0.262 |

Municipal Water Supply (ML) | 0.734 | 0.262 |

Variable | Correlation Coefficient | R2 |

Minimum Temperature (^{o}C) | -0.073 | 0.0053 |

Maximum Temperature (^{o}C) | 0.790 | 0.6241 |

Average Temperature (^{o}C) | 0.164 | 0.0269 |

Wind Speed (km/h) | 0.341 | 0.1163 |

Reservoir Elevation (m) | 0.137 | 0.0188 |

Watershed Precipitation (mm) | -0.190 | 0.0361 |

Dam surface Precipitation (ML) | 0.253 | 0.0640 |

Dam surface Evaporation (ML) | -0.265 | 0.0702 |

Reservoir Inflow (ML) | -0.029 | 0.0008 |

Reservoir Outflow (ML) | -0.197 | 0.0388 |

Reservoir Storage (ML) | 0.154 | 0.0237 |

Turbine Release (ML) | -0.026 | 0.0007 |

Irrigation Water Release (ML) | 0.687 | 0.4720 |

Municipal Water Supply (ML) | 0.606 | 0.3672 |

Variable | S | Variance (S) | Z | Trend Significance | |

95% | 99% | ||||

Minimum Temperature (^{o}C) | 55 | 4115.667 | 0.842 | No | No |

Maximum Temperature (^{o}C) | 334 | 4145.333 | 5.172 | Yes | Yes |

Average Temperature (^{o}C) | 67 | 4112.333 | 1.029 | No | No |

Wind Speed (km/h) | 113 | 4151.000 | 1.738 | No | No |

Reservoir Elevation (m) | 49 | 3461.667 | 0.816 | No | No |

Watershed Precipitation (mm) | -30 | 1833.333 | -0.677 | No | No |

Dam surface Precipitation (ML) | 67 | 3461.667 | 1.122 | No | No |

Dam surface Evaporation (ML) | -93 | 3461.667 | -1.564 | No | No |

Reservoir Inflow (ML) | -19 | 3461.667 | -0.306 | No | No |

Reservoir Outflow (ML) | 17 | 3461.667 | 0.272 | No | No |

Reservoir Storage (ML) | 53 | 3461.667 | 0.884 | No | No |

Turbine Release (ML) | -28 | 3802.667 | -0.438 | No | No |

Irrigation Water Release (ML) | 111 | 817.000 | 3.848 | Yes | Yes |

Municipal Water Supply (ML) | 83 | 817.000 | 2.869 | Yes | Yes |

Variable | Sen Model Equation | Regression Model Equation |

Minimum Temperature (^{o}C) | y = 0.0085x - 5.3524 | y = -0.004x + 19.93 |

Maximum Temperature (^{o}C) | y = 0.0600x - 96.32 | y = 0.078x - 133.7 |

Average Temperature (^{o}C) | y = 0.0064x + 4.1428 | y = 0.006x + 4.852 |

Wind Speed (km/h) | y = 0.051x - 83.7370 | y = 0.046x - 74.79 |

Reservoir Elevation (m) | y = 0.0667x - 77.2877 | y = 0.072x - 91.15 |

Watershed Precipitation (mm) | y = 5486.6458 - 2.4863x | y = -2.788x + 6079 |

Dam surface Precipitation (ML) | y = 48.8098x - 93325.5258 | y = 48.84x - 93325 |

Dam surface Evaporation (ML) | y = 189266.7823 - 85.6231x | y = -83.13x + 18430 |

Reservoir Inflow (ML) | y = 2343832.67 - 971.18x | y = -815.2x + 2E+06 |

Reservoir Outflow (ML) | y = 1377.9458x - 2403626.43 | y = 6966.x - 1E+07 |

Reservoir Storage (ML) | y = 7609x - 12607593.25 | y = 8202.x - 1E+07 |

Turbine Release (ML) | y = 3689198.6061 - 1709.9297x | y = 243.0x - 13841 |

Irrigation Water Release (ML) | y = 831.7600x - 1634940.88 | y = 1155.x - 2E+06 |

Municipal Water Supply (ML) | y = 1.633x - 3207.08 | y = 3.530x - 6990 |

The Mankendall analysis detected an insignificant positive trend in minimum tempratures over the 35 year period of available data (Figure 1). A Man-Kendall Statistic S = 55 and Sen slope estimate Q = 0.0085 indicate a positive trend in the timeseries. On the contrary, a regression slope coefficient a = -0.004 indicate a negative trend. [23] has noted that the time-series of hydro-meteorological random variable often exhibit a remarked skew and are usually not normally distributed. Hence in this study, the results of the non-parametric Mankendall test and Sen slope estimate are accepted. Therefore, we conclude that there is a positive trend in the time series of minimum temperature, though the trend is not significant.

For the analysis of the average temperature (Figure 2), a Man-Kendall Statistic S = 67, Sen slope estimate Q = 0.0064, and the regression slope coefficient a = 0.006 all indicate a positive trend. A correlation coefficient a = 0.164 shows a weak relationship, and the Z value of 1.0292 shows that the trend is not significant.

The analysis of the time-series of maximum temperature however, shows beyond reseaonable doubt that there is evidence of global warming in the region. The maximum temperature trend is significant at the 95% and 99% levels of significance (Table 4), thus indicating a highly significant rise in maximum temperatures (Figure 3). A Man-Kendall Statistic S = 334, Sen slope estimate Q = 0.06, and the regression slope coefficient a = 0.078 indicate a positive trend. A correlation coefficient a = 0.79 also shows a strong relationship and a Z value of 5.172 shows that the trend is highly significant.

It was found that wind speed has generally being in a uptrend. A Man-Kendall Statistic S = 113 and positive values of Sen slope estimates and regression coefficient accompanied by a Z value of 1.738 also showed that relative windspeed has being in an insignificant uptrend (Figure 4).

In order to obtain an estimate of global warming on precipitaion in the river watershed, the rainfall time series data of a nearby town was analysed. The analysis showed that rainfall in areas futher away from the water bodies has been in a down (though not significant) trend (Figure 5). Analysis of the inflow into the reservoir (Figure 6) also shows a downward insignificant trend. Thus less precipitation on the water shed has invariably resulted in less inflow from the surrounding areas. On the contrary, it was found that rainfall in the immediate vicinity of the dam, especially on the dam surface has been in an uptrend (Figure 7). Though the trends are not significant, a study of the trends however is in concordance with the speculation of [6] that areas away from water sources are likely to get less precipitation and dry out, while areas closer to water bodies will get more rainfall thus resulting in a localized climate phenomenom. Dam surface evaporation has also been in an unsignificant downtrend (Figure 8).

Analysis of reservoir elevation (Figure 9) and resrvoir storage (Figure 10) with a Z statistics of 0.8185 and 0.8838 respectively, shows that this variables have also been in an insignificant uptrend. The insignificant uptrend in the outflow from the reservoir (Figure 11), suggests that though there has been a slight drop in inflow to the reservoir due to the dry conditions of the surrounding lands, this has been more than compensated for by the increase in the dam surface precipitation and drop in reservoir evaporation.

The high increase in irrigation water supply from the reservoirs is among many other factors partly due to the impact of climate change in the surrounding area. Drying-out farmlands in the surrounding areas requires more irrigation water from the reservoir. A Man-Kendall Statistic S = 111, Sen slope estimate Q = 831.76 and a regression slope coefficient a = 1155 all indicate a strong positive trend. A correlation coefficient a = 0.687 shows a strong relationship and a Z value of 3.84 shows that the trend is significant at the 95% and 99% critical levels (Table 4). Likewise, municipal water supply to surrounding towns has been in a highly significant trend (Figure 13). Figure 14 shows that there has been a slight drop in the water supplied to the power industry.

Developing quantitative functional relationship between hydro-meteorological random variables and time helps provide insight into how the variables trends and provide a means of extrapolation for future climate scenarios. Quantifying the extent to which precipitation and streamflow changes are due to changes in regional climate is an important problem in hydrology. Specifically, in this study, an attempt was made to develop models that relate climatic variables to time and provide a means of estimating future climate change in quatitative terms for the Vanderkloof River basin in South Africa.

From the results of the analyses, temperatures in the vicinity of the dam have been in a significant uptrend over the years. Thus it may be concluded that there is enough evidence of global warming in the region. Global warming has produced favourable climate conditions around Vanderkloof dam as evident from the slight uptrend in the rainfall on the dam surface. The non-significant decrease in inflow to the dam has been balanced by the increased precipitation on the reservoir surface. The significant uptrend in irrigation water supply to surrounding farmlands due to dryer conditions and the significant increase in municipal water supply have resulted in a slight decrease in water supply to the power industry, though not significant. The slight uptrend in the outflow from the reservoir suggests that water supply for various uses is still sustainable under the prevailing climate condition if properly allocated. Recommendation is hereby made to the operators of the dam to optimize the release of water from the dam so as to ensure optimal operation of the dam and sustain power generation, irrigation and allocation for other uses so as to maximize the net benefit obtainable from the reservoir under the prevailing climate condition.