Open access peer-reviewed chapter

Prediction of Relative Humidity in a High Elevated Basin of Western Karakoram by Using Different Machine Learning Models

Written By

Muhammad Adnan, Rana Muhammad Adnan, Shiyin Liu, Muhammad Saifullah, Yasir Latif and Mudassar Iqbal

Submitted: 02 March 2021 Reviewed: 03 May 2021 Published: 03 June 2021

DOI: 10.5772/intechopen.98226

From the Edited Volume

Weather Forecasting

Edited by Muhammad Saifullah

Chapter metrics overview

430 Chapter Downloads

View Full Metrics

Abstract

Accurate and reliable prediction of relative humidity is of great importance in all fields concerning global climate change. The current study has employed Multivariate Adaptive Regression Spline (MARS) and M5 Tree (M5T) models to predict the relative humidity in the Hunza River basin, Pakistan. Both the models provided the best prediction for the input scenario S6 (RHt-1, RHt-2, RHt-3, Tt-1, Tt-2, Tt-3). The statistical analysis displayed that the MARS model provided a better prediction of relative humidity as compared to M5T at all meteorological stations, especially, at Ziarat followed by Khunjerab and Naltar. The values of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were (5.98%, 5.43%, and 0.808) for Khunjerab; (6.58%, 5.08%, and 0.806) for Naltar; and (5.86%, 4.97%, 0.815) for Ziarat during the testing of MARS model whereas, the values were (6.14%, 5.56%, and 0.772) for Khunjerab; (6.19%, 5.58% and 0.762) for Naltar and (6.08%, 5.46%, 0.783) for Ziarat during the testing of M5T model. Both the models performed slightly better in training as compared to the testing stage. The current study encourages future research to be conducted at high altitude basins for the prediction of other meteorological variables using machine learning tools.

Keywords

  • relative humidity
  • MARS
  • M5T
  • Hunza
  • machine learning

1. Introduction

The relative humidity is defined as the amount of water vapor in the air in comparison with the full saturation [1, 2]. Being the important indicator of precipitation forecasting, its prediction plays a significant part in improving the accuracy of weather forecasting [3]. The relative humidity changes with respect to change in saturated vapor pressure which further depends on wind speed, solar radiation, pressure, temperature, and moisture content in the air [1]. The relative humidity is a function of temperature and is regarded as a sensitive parameter in the field of science [4]. Relative humidity plays a vital role in plant growth, agricultural and industrial production and in the prevention and control of air pollution [5]; economic stability of a region, water systems and also in managing renewable and solar energy systems [1, 6], weather and climate [7, 8]. Moreover, it has also an impact on ozone concentration and adaptive thermal comfort [9]. Keeping in view the importance of relative humidity, the research on its prediction is increasingly important [7].

The relative humidity is an important aspect of the hydrological phase [8] and has a role in alpine hydrology, especially, in a cold and dry climate; any change in temperature and humidity causes larger variations in the ablation of glaciers [10]. The warm environment glaciers are subjected to be influenced more by the change in relative humidity. Few other studies e.g. [11, 12, 13] also observed that tropical glaciers are sensitive to subtle changes in relative humidity, precipitation, and cloudiness. Relative humidity and clouds play an important role in the energy balance of glaciers by controlling the number of outgoing longwave radiation. Moreover, relative humidity and wind speed influence the turbulent latent heat flux which supplies all energy for sublimation and thus they indirectly control the equilibrium line altitude (ELA) [14]. Another study conducted by [15] observed that relative humidity has an effect on evaporation and there is an inverse relation between them. Evaporation further controls the water balance of closed lakes in hilly areas and evapotranspiration, especially, in irrigated agricultural areas.

Regardless of relative humidity is an important component of hydrology, meteorology, and climate, only a few studies are available for its prediction. A study conducted by [1] used artificial neural networks (ANNs) and genetic expression programming (GEP) models for the prediction of relative humidity as a function of three meteorological variables: wind speed, temperature, and pressure in two Californian gauging stations. They observed that both the models can successfully predict one-year relative humidity data into the future. Another study done by [5] predicted relative humidity by establishing time series models such as Extreme Gradient Boosting (XGBoost), Seasonal Auto-Regressive Integrated Moving Average (SARIMA), and Holt-Winters (HW). The XGBoost was found more accurate because of its robust capability to resist a fitting. The study conducted by [3] found that the performance of an autoregressive integrated moving average (ARIMA) model is better than the Long Short-Term Memory (LSTM) Network for the prediction of relative humidity. On contrary, [8] observed that the LSTM network is capable of predicting complex univariate relative humidity time series with robust no-stationarity. However, Least Square Support Vector Machine (LSSVM) and Adaptive Network-Based Fuzzy Inference System (ANFIS) models were used by [2] for prediction of relative humidity in terms of dry bulb temperature and wet bulb depression and found satisfactory.

Another study conducted by [16] proposed four ANNs models to predict the relative humidity and temperature in a swine livestock warehouse located in Puerto Gaitan–Meta. They observed that the models used in the study are suitable for the prediction of humidity in barns not equipped with humidity sensors. However, [17] used an improved backpropagation (BP) neural network for the prediction of indoor relative humidity and temperature every 10 min and 6–72 hours in advance based on a cloud database in Chongqing, China. Both temperature and humidity predictions have a strong correlation with the observed data. Similarly, another study conducted by [18] used BP neural network for the prediction of one day ahead mean air temperature and relative humidity of greenhouse located in the sub-humid sub-tropical regions of India. The results displayed that the BP neural network model provided the best prediction for inside temperature and relative humidity. However, a study done by [19] used daily minimum air temperature (Tn) downscaled from INMCM4 general circulation model (GCM) to predict the relative humidity for climate change studies but relative humidity predictions were poor in few months especially in March, July, August, and October. Moreover, a study conducted by [20] proposed a Functional Link Neural Network (FLNN) which comprises of a single layer of tunable weight trained with the Modified Cuckoo Search algorithm (MCS) for prediction of daily temperature and relative humidity. It was observed that FNN when trained with MCS produced less prediction error. Further, an attempt has been made for the prediction of relative humidity and temperature at different locations inside tobacco dryer by [21] by using a fitting ANN model. Another study performed by [22] also used different ANN models to successfully forecast indoor relative humidity and temperature in the education building of Izmir, Turkey.

Formerly, no attempt has been made for the prediction of relative humidity in the alpine catchment where there is an issue of data scarcity. The current study is unique because it uses two machine learning models such as MARS and M5T to predict the relative humidity in the Hunza basin (glaciated basin), Pakistan. MARS model was selected because it requires a short training process and has the ability to model complex nonlinear processes deprived of strong model assumptions as compared to ANNs models [23, 24] whereas the M5T model was selected because of its small computation cost and ease in large data treatment as compared to support vector machine (SVM) and ANN [25, 26]. In previous studies, mostly these models were used for the prediction of runoff in poorly gauged basins. A study conducted by [27] suggested that the MARS method is capable of predicting short-term runoff forecast in mountainous watersheds whereas MARS was successfully used for the prediction of streamflows with inadequate data input in the mountainous catchment by [28]. Similarly, the M5T model was found useful in the prediction of streamflows of several tributaries by [29] and it was observed that predictions are good in rainless periods. Another study conducted by [30] found the M5T algorithm reliable in the prediction of streamflows. Several other studies also encouraged the researchers to use MARS and M5T models for the prediction of runoff e.g. [31, 32, 33, 34, 35, 36, 37]. Apart from runoff prediction, MARS and M5T models were also used for the prediction of evapotranspiration (ET) and Pan Evaporation (Ep). A study conducted by [38] compared the performance of M5T, MARS along with calibrated Hargreaves-Samani (CHS), MLP, and Stephens-Stewart (SS) models and observed that MARS performed better in the prediction of Ep. Another study conducted by [39] found that the M5T model outperformed compared to Ritchie Equation for the prediction of ET. Similarly, [40] successfully predicted reference evapotranspiration by using M5T and ANN models.

Advertisement

2. Study area

Hunza is a glaciated sub-catchment of the Upper Indus Basin (UIB) and is located in the western Karakoram Himalayan region of Pakistan (Figure 1). The basin lies within the extent of 74°02′–75°48′E and 35°54′–37°05′N and encompasses 13,671 km2 of the catchment area.

Figure 1.

Location map of the study area.

The elevation of the basin ranged from 1391 to 7850 m. About 20% catchment area of the basin is covered by glaciers [41] and there are 110 glacial lakes in the basin [42]. It is the main tributary of the Indus Basin Irrigation System (IBIS) and it contributes about 12% of UIB streamflows upstream of Tarbela dam [43]. The climate of the Hunza basin is arid to semi-arid and is normally categorized by two seasons, October to March as winter and April to September as summer. The weather conditions vary within the basin. At low altitudes, weather is hot whereas at high altitudes winters are cold and there are extensive variations in temperature extremes [44]. The mean total annual precipitation varies with respect to altitude; low altitude station such as Naltar (2858 m) receives more precipitation i.e. 660 mm as compared to high altitude station Khunjerab (4730 m) which receives 165 mm of precipitation. The meteorological station installed in between Naltar and Khunjerab (i.e. Ziarat, 3669 m) receives 292 mm of precipitation [45, 46].

The temporal variations in meteorological variables of Khunjerab station (using data of 1995–2009) are displayed in Table 1. Table 1 shows that the maximum temperature varies between −11.1°C (January) to 11.6°C (July) whereas minimum temperature varies from −21.3°C (January) to 1.3°C (July). The maximum relative humidity in the basin varies from 59% (March) to 91% (August) while minimum relative humidity varies from 23% (March) to 52% (December). The daily solar radiation in the Hunza basin varies from 2563 (December) to 5148 (May) watt/m2.

MonthMaximum Temperature (°C)Minimum Temperature (°C)Maximum Relative Humidity (%)Minimum Relative Humidity (%)Solar Radiation (watt/m2)
January−11.1−21.362302933
February−11.0−19.777343500
March−4.5−16.959234394
April0.2−9.778304750
May5.5−4.681265148
June7.7−1.387445102
July11.61.386384858
August10.5−0.391344711
September4.6−4.586274227
October0.8−10.478354003
November−5.8−16.168393452
December−11.0−18.980522563

Table 1.

Monthly average variations in meteorological variables of Khunjerab (1995–2012).

Advertisement

3. Material and methods

3.1 Topography

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), Global Digital Elevation Model (GDEM) was used to delineate the catchment boundary of the Hunza basin. The Hunza basin was delineated using ASTER GDEM v3 data in Arc GIS. The data was acquired from the website: https://lpdaac.usgs.gov/tools/data-pool/. The format of the downloaded tiles was Geo-TIFF and has the gridding resolution i.e. (30 m) and tile structure (1°x 1°).

3.2 Meteorological data

There are four meteorological stations in the Hunza River basin such as Hunza, Naltar, Khunjerab, and Ziarat (Table 2). The Hunza meteorological station was installed by the Pakistan Meteorological Department (PMD) and the record is available from 2007 to onward whereas the other three stations were installed and managed by Water and Power Development Authority (WAPDA) and the record is available from 1995 to onward. The current study has employed daily data of temperature, precipitation, solar radiation, and relative humidity of Ziarat, Naltar, and Khunjerab meteorological stations. The required data of the aforementioned stations were acquired from the Surface Water Hydrology Project of the Water and Power Development Authority (SWHP-WAPDA), Pakistan from 1995 to 2009 (Table 2).

Meteorological StationLatitude (DD)Longitude (DD)Elevation (m)DataAgency
Hunza36.32074.6402374P, Tmax, Tmin, RH, SRPMD
Naltar36.21674.2662858SWHP-WAPDA
Khunjerab36.85075.4004730SWHP-WAPDA
Ziarat36.83074.4303669SWHP-WAPDA

Table 2.

List of meteorological stations in the Hunza basin.

Note: DD = Degree decimal; P= Precipitation; Tmax= Maximum temperature; Tmin= Minimum temperature; RH = Relative humidity; SR = Solar radiation.

3.3 Machine learning models

The current study has employed two machine learning models such as M5 Tree and MARS for the prediction of relative humidity at three meteorological stations of the Hunza basin. Their detailed description is given below:

3.3.1 M5 tree model

The M5T model was first introduced by [47]. Model trees simplify the theories of regression trees and there are constant values at their leaves [48]. M5T model is established in relation to a binary decision tree where linear regression functions are placed in the terminal node (leaf) and a relationship is developed between dependent and independent variables through it [49]. Model development involves two stages; the first stage involves in creation of a decision tree by using a split criterion whereas in the second stage overgrown tree is pruned for designing the model tree [25]. The splitting stage in the M5T model is composed of regression function at the leaves instead of class labels and continuous numerical attributes can be estimated through it [36]. The splitting criterion for the M5T model procedure is based on the standard deviation reduction (SDR) function achieved in every node. This criterion points out the error in that node and the minimum expected error is calculated by the model because of testing each attribute in that node [50, 51]. The SDR in the M5T model can be calculated by the following Equation [47]:

SDR=sdMMiMsdMiE1

Where SDR specifies the standard deviation reduction and sd indicates standard deviation; M specifies a set of examples that reaches the node; whereas Mi signifies the subset of examples that have the ith outcome of the potential set.

Because of the splitting or branching process, data in child nodes (smaller nodes) have less SD than parent nodes (greater nodes). The division process often results in producing a large tree-like structure which causes overfitting and this issue can be resolved by pruning back the tree [52], for instance by substituting a subtheme with a leaf. Pruning the overgrown tree and substitution of subthemes with linear regression functions are performed in the second stage of model designing. This method of producing the model tree separates the parameter space into subspaces and builds in each of them a linear regression model.

3.3.2 MARS algorithm

MARS model was first developed by [53]. Its working procedure involved establishing a relationship among a set of input variables and the target-dependent that involve connections with less number of variables [54]. MARS produces flexible models to facilitate the solution space to be divided into several intervals of independent parameters whereas individual splines are fit to each interval [53]. This method is non-parametric and non-linear and it involves a forward-backward procedure to predict a continuous dependent parameter in high-dimensional data [55]. No assumptions have been made about the fundamental functional relationships between independent and dependent variables by the MARS model. In MARS, the splines are connected smoothly together to form piecewise curves which are also known as basis functions (BFs), and these form a flexible model which is capable of handling both linear and non-linear behavior [54]. Two stages are involved in setting up the MARS model which includes forward (constructing the model) and backward (a pruning procedure) stages. In the first stage (forward), to define a pair of BFs candidates, knots are placed within the range of each predictor variable. To produce a maximum reduction in sum-of-squares residual error, the model adjusts the knot and its corresponding pair of BFs in each step. This process of adding BFs lasts and generally a very complex and overfitted model is produced. However, the overfitted model is pruned by deleting the less important redundant BFs in the backward stage [54, 55].

The MARS model f(X) is generally expressed by the following equation;

fx=δo+m=1MδmhmXE2

Where δoand δm denote the coefficients which are calculated by the least sum of squared errors from splines functions, whereas hmX represents the spline functions, and M denotes the number of functions. The pruning stage improves the forecasting accuracy of the model and M is determined during this phase [55].

3.4 Model setup

The current study compares the accuracy of two machine learning methods such as MARS and M5Tree, for the prediction of daily relative humidity using different input data combinations of precipitation, temperature, and relative humidity. These machine learning models were applied on three meteorological stations such as Khunjerab, Naltar, and Ziarat one by one. The flowchart of the current study is displayed in Figure 2. Each model was applied on these stations separately with different input data combinations for the prediction of relative humidity (RH). Ten input data combinations were developed for each meteorological station by each model to decide the best input data combination for the prediction of relative humidity. Initially, three preceding relative humidity (RH) input combinations such as (i) RHt-1, (ii) RHt-1 and RHt-2, and (iii) RHt-1, RHt-2, and RHt-3 were tried to both the models to predict current RH (RHt). After that, three precipitation (i.e. (i) Pt-1, (ii) Pt-1, Pt-2, (iii) Pt-1,Pt-2,Pt-3) and temperature inputs (i.e. (i) Tt-1, (ii) Tt-1,Tt-2, (iii) Tt-1,Tt-2,Tt-3) combinations were separately added to the best RH combination whereas in the last input combination (10th); best temperature and precipitation inputs were added together with the best RH input combination to see the combine effect of both parameters on model’s accuracy in predicting relative humidity.

Figure 2.

Flowchart of the study.

The current analysis involves daily data of precipitation, temperature, and relative humidity from 1995 to 2009. About 75% of input data i.e. from 1995 to 2006 was used for training whereas 25% of input data i.e. from 2007 to 2009 was used for testing in both machine learning models for prediction of relative humidity. However, [8] used only two-year data i.e. 2008 to 2009 for training the LSTM model which might not be enough for reliable predictions.

3.5 Models evaluation criteria

The models’ accuracy in relative humidity prediction against observed data was evaluated using the following statistics which are normally used in the related literature. The statistics include R2, RMSE, and MAE as shown in Eqs. (3)-(5).

R2=11ni=1nRHiRH¯21ni=1nrhirh¯2E3
RMSE=i=1NRHiORHiM2NE4
MAE=i=1NRHiORHiMNE5

Where rh¯ indicates the observed mean relative humidity; RH¯is the mean of the predicted relative humidityRHi; N signifies the number of data points. Moreover, RHiO is observed relative humidity and RHiM is modeled relative humidity. Previous studies such as [56, 57, 58, 59, 60, 61] suggested that a single statistical indicator cannot examine well the prediction accuracy of soft computing models. Therefore, the current study used three statistical indicators to judge the model prediction accuracy with confidence. When the error distributions of the models are normal and uniform in that case the use of error statistics such as RMSE and MAE is more suitable. For an ideal model, the values of RMSE and MAE should equal to 0, whereas, R2 should equal to 1. The model having relatively small values of MAE and RMSE as compared to other models is considered the best model.

Advertisement

4. Results and discussions

4.1 Performance evaluation of MARS model in predicting relative humidity

The performance evaluation statistics of the MARS model for the prediction of relative humidity at Khunjerab, Naltar, and Ziarat are presented in Tables 35, respectively. The MARS model performed excellent for the prediction of relative humidity at all meteorological stations both during training and testing processes especially, it provided the best predictions for the 6th scenario (S6) of input data combination which is highlighted in bold. The RMSE, MAE, and R2 values during the training (5.58%, 4.51%, 0.852) and testing (5.98%, 5.43%, 0.808) stages for Khunjerab meteorological station are displayed in Table 3. The MARS model performed better during training as compared to testing at Khunjerab. However, the MARS model did not perform well for the S1, S2, and S3 scenarios. Our study results were found better than the study conducted by [1]. They described that GEP and ANNs models can predict relative humidity reliably at two Californian stations (RMSE= 10.7%, MAE= 7.6% and R2 = 0.73) during training; and (RMSE= 10.1%, MAE= 7.5% and R2 = 0.714) during testing stage in the case of GEP model. However, ANN model produced better results as compared to GEP such as (RMSE= 7.8%, MAE= 3.6% and R2 = 0.826) during training, and (RMSE= 8.2%, MAE= 4.1% and R2 = 0.751) during testing stage.

ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.008.490.48013.5910.510.381
S2RHt-1, RHt-210.888.390.49113.5810.510.385
S3RHt-1, RHt-2, RHt-310.868.360.49313.5310.450.388
S4RHt-1, RHt-2, RHt-3,Tt-15.784.640.8236.435.620.782
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-25.644.570.8316.125.570.801
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.584.510.8525.985.430.808
S7RHt-1, RHt-2, RHt-3,Pt-19.767.620.60211.678.750.532
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-29.167.280.64310.878.380.544
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.087.190.64910.738.290.552
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-36.215.130.8026.035.530.803

Table 3.

The statistical evaluation of the MARS model at Khunjerab.

Bold values represent the best input data combination.


ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.428.950.58814.2210.690.499
S2RHt-1, RHt-211.098.650.61213.9910.310.518
S3RHt-1, RHt-2, RHt-311.028.610.61613.9810.320.517
S4RHt-1, RHt-2, RHt-3,Tt-15.844.730.8126.845.340.783
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-25.764.620.8186.735.250.792
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.634.530.8266.585.080.806
S7RHt-1, RHt-2, RHt-3,Pt-110.247.630.67311.738.360.624
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-210.087.460.69211.298.070.645
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.367.130.72410.767.930.663
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-35.714.710.8156.745.180.796

Table 4.

The statistical evaluation of the MARS model at Naltar.

Bold values represent the best input data combination.


ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.298.580.53014.7510.900.420
S2RHt-1, RHt-211.178.490.54014.7310.860.424
S3RHt-1, RHt-2, RHt-311.128.460.54414.7710.890.421
S4RHt-1, RHt-2, RHt-3,Tt-15.734.920.8016.135.230.792
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-25.584.760.8136.025.060.807
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.264.590.8335.864.970.815
S7RHt-1, RHt-2, RHt-3,Pt-110.037.130.62412.758.340.542
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-29.756.480.68711.788.070.568
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.486.290.69811.387.840.597
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-35.384.680.8205.945.020.812

Table 5.

The statistical evaluation of the MARS model at Ziarat.

Bold values represent the best input data combination.


Similarly, the MARS model provided the best prediction of relative humidity for the S6 input data scenario at Naltar both during training and testing stages as shown in Table 4. The RMSE, MAE and R2 values for the best input parameter combination were 5.63%, 4.53%, and 0.826 respectively, during training whereas 6.58%, 5.08%, and 0.806, were during testing (Table 4). The MARS model did not perform well for S1, S2, and S3 input combinations. However, a study conducted by [5] observed that the XGBoost model provided the best prediction of relative humidity (MAE= 2.29%) as compared to SARIMA (MAE= 2.97%) and HW additive (MAE= 2.74%).

However, the MARS model performed the best (RMSE= 5.86, MAE= 4.97%, R2 = 0.815) for prediction of relative humidity at Ziarat for the S6 input combination during the testing stage as shown in Table 5. The MARS model also performed fairly well during training stage (RMSE= 5.26%, MAE= 4.59%, R2 = 0.833) for S6 input combination. The MARS model provided a poor prediction of relative humidity for S1, S2, and S3 input scenarios (Table 5). Overall, the MARS model performed fairly well at Khunjerab (R2= 0.852) and showed slightly low performance at Naltar (R2 =0.826) for the S6 input combination during the training stage (Tables 35).

The MARS model performance was also evaluated by drawing scatter plots. The scatter plots had been drawn between observed and predicted relative humidity from 2007 to 2009 on daily data as displayed in Figure 3. Scatter plots also displayed that the MARS model outperformed for prediction of relative humidity at all meteorological stations, especially, at Ziarat with R2 = 0.815 for the S6 input combination during the testing stage (Figure 3).

Figure 3.

Scatter plots between observed and predicted relative humidity by using MARS model at (a) Khunjerab; (b) Naltar and (c) Ziarat.

4.2 Performance evaluation of M5T model in predicting relative humidity

The performance evaluation of the M5T model for the prediction of relative humidity at Khunjerab, Naltar, and Ziarat is displayed in Tables 68, respectively. The M5T model also performed well for the prediction of relative humidity at all meteorological stations both during training and testing stages; however, it provided the best predictions of relative humidity for the 6th input data combination (S6) at all stations which are highlighted in bold. Overall, the M5T model performance was slightly lower as compared to MARS. The M5T model also performed better during training as compared to testing at all meteorological stations. However, the M5T model provided the best prediction of relative humidity at Ziarat as compared to Naltar and Khunjerab (Table 8). However, the M5T model did not perform well for the prediction of relative humidity for the S1, S2, and S3 scenarios with R2<0.50 at all meteorological stations (Tables 68). A previous study conducted by [8] observed that the LSTM model is capable of forecasting complex univariate relative humidity time series. On contrary, [3] suggested that ARIMA can provide a better prediction of relative humidity as compared to LSTM.

ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.088.560.47613.6410.540.378
S2RHt-1, RHt-210.948.450.48613.6110.530.382
S3RHt-1, RHt-2, RHt-310.908.410.49113.5610.490.391
S4RHt-1, RHt-2, RHt-3,Tt-16.715.760.7586.946.320.726
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-26.625.640.7656.896.130.748
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.945.080.7966.145.560.772
S7RHt-1, RHt-2, RHt-3,Pt-19.797.680.59811.778.820.529
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-29.207.320.63910.928.440.541
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.137.240.64210.838.360.548
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-36.435.320.7726.235.810.752

Table 6.

The statistical evaluation of the M5T model at Khunjerab.

Bold values represent the best input data combination.


ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.128.570.47613.6810.580.378
S2RHt-1, RHt-210.928.450.48613.6210.540.381
S3RHt-1, RHt-2, RHt-310.898.380.49113.5710.480.384
S4RHt-1, RHt-2, RHt-3,Tt-16.765.790.7526.906.360.721
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-26.585.610.7606.816.180.742
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.825.120.7916.195.580.762
S7RHt-1, RHt-2, RHt-3,Pt-19.847.670.59811.768.790.529
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-29.287.380.63810.968.480.541
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.117.230.64610.778.320.550
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-36.525.460.7676.375.940.758

Table 7.

The statistical evaluation of the M5T model at Naltar.

Bold values represent the best input data combination.


ScenarioInput CombinationsTrainingTesting
RMSE (%)MAE (%)R2RMSE (%)MAE (%)R2
S1RHt-111.128.540.47613.6410.560.378
S2RHt-1, RHt-210.958.430.48713.6110.540.381
S3RHt-1, RHt-2, RHt-310.928.390.49113.5810.490.384
S4RHt-1, RHt-2, RHt-3,Tt-16.675.700.7586.826.270.728
S5RHt-1, RHt-2, RHt-3,Tt-1,Tt-26.475.520.7646.726.100.752
S6RHt-1, RHt-2, RHt-3,Tt-1,Tt-2,Tt-35.745.040.7966.085.460.783
S7RHt-1, RHt-2, RHt-3,Pt-19.797.660.59911.728.780.530
S8RHt-1, RHt-2, RHt-3,Pt-1,Pt-29.207.320.64010.908.420.541
S9RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-39.127.210.64510.788.320.550
S10RHt-1, RHt-2, RHt-3,Pt-1,Pt-2,Pt-3, Tt-1,Tt-2,Tt-36.265.160.8006.085.580.778

Table 8.

The statistical evaluation of the M5T model at Ziarat.

Bold values represent the best input data combination.


At Khunjerab station, the M5T model performed well (RMSE= 5.94%, MAE = 5.08%, R2= 0.796) in case of S6 input combination during model training stage whereas it displayed low prediction performance (RMSE= 6.14%, MAE= 5.56%, R2= 0.772) during testing stage as shown in Table 6. Similarly, the M5T model did not perform well for the S1, S2, and S3 scenarios (R2 <0.50). Similarly, at Naltar station, the M5T model performed reasonably well (RMSE= 5.82%, MAE= 5.12%, R2= 0.791) for S6 input combination during training stage whereas it exhibited a slightly low performance (RMSE= 6.19%, MAE= 5.58%, R2= 0.762) during testing stage as presented in Table 7.

However, the M5T model provided the best prediction of relative humidity at the Ziarat station for the S6 input combination (Table 8). The M5T model performed better during training (RMSE= 5.74%, MAE= 5.04%, R2= 0.796) as compared to testing (RMSE= 6.08%, MAE= 5.46%, R2= 0.783) stage as displayed in Table 8.

The M5T model performance was also evaluated by drawing scatter plots. The scatter plots were drawn between observed and predicted relative humidity from 2007 to 2009 on daily data as displayed in Figure 4. Scatter plots showed that, the M5T model can also predict relative humidity fairly well at all meteorological stations, especially, at Ziarat (R2= 0.782) for the S6 input combination during the testing stage (Figure 4).

Figure 4.

Scatter plots between observed and predicted relative humidity by using the M5T model at (a) Khunjerab; (b) Naltar and (c) Ziarat.

4.3 Time variations of the observed and predicted relative humidity by MARS and M5T models

Time variations of the observed and predicted relative humidity by MARS and M5T model at Khunjerab, Naltar, and Ziarat meteorological stations are displayed in Figures 57. Time variations plots have been drawn by using the best-predicted data of relative humidity (i.e. S6 scenario). The daily data has been drawn from 2007 to 2009. Figure 5 showed that both the models captured time-series variations of predicted relative humidity very well with reference to observed data at Khunjerab station but slightly underestimated the values from 900 to 1100 days. Moreover, these models slightly underestimated the low and high values of predicted relative humidity with reference to observed data at few points throughout the time series. Overall, the MARS model performed better as compared to M5T for the prediction of daily relative humidity data at Khunjerab.

Figure 5.

Time variation of the observed and predicted relative humidity by MARS and M5T model at Khunjerab station.

Figure 6.

Time variation of the observed and predicted relative humidity by MARS and M5T model at Naltar station.

Figure 7.

Time variation of the observed and predicted relative humidity by MARS and M5T model at Ziarat station.

The MARS and M5T models also captured time-series variation of relative humidity superbly with respect to observed data at Naltar station for the S6 input combination as displayed in Figure 6. Both the models slightly underestimated the predicted relative humidity from 850 days to 1100. Moreover, these models slightly underestimated the predictions of low and high values of relative humidity at some points throughout the study period. Overall, the MARS model provided better predictions of relative humidity as compared to M5T at Naltar (Figure 6).

However, both the machine learning models provided the best prediction of relative humidity at Ziarat which is a mid-altitude meteorological station as shown in Figure 7. Both the models captured the temporal variations of relative humidity very well throughout the period with reference to observed data for the S6 input combination. Furthermore, the models underestimated the low and high values of predicted relative humidity with reference to observed data. The MARS model predicted low and high values of relative humidity fairly well but it slightly underestimated the values at few points throughout the study period. Overall, the MARS model provided better predictions of relative humidity as compared to M5T at Ziarat (Figure 7).

Advertisement

5. Conclusions

Relative humidity has an important impact on plant growth, human health, industry, weather, and climate. Any change in temperature and relative humidity may result in droughts, heatwaves, floods, and hurricanes. Thus the relative humidity is one of the important factors to measure environmental changes. Keeping in view the importance of relative humidity, the current study has attempted to predict the relative humidity in a high elevated alpine basin (Hunza) of western Karakoram by using the MARS and M5T machine learning models. The current study is novel in that respect that previously nobody tried to predict the relative humidity in a high elevation alpine basin.

Statistical analysis of the model outputs suggested that both the models produced reliable predictions of relative humidity at Khunjerab, Naltar, and Ziarat meteorological stations of the Hunza basin during both training and testing stages. Out of 10 input data combinations of temperature, precipitation, and relative humidity, the 6th combination (i.e. RHt-1, RHt-2, RHt-3, Tt-1, Tt-2, Tt-3) produced the best results for each station by each model. The statistical indicators confirmed the excellent performance of both the models at all stations. For the MARS model, RMSE, MAE, and R2 values ranged from 5.26–5.63%, 4.51–4.59%, and 0.826–0.856, respectively, during the training stage while they ranged from 5.86–6.58%, 4.97–5.43%, and 0.806–0.815, respectively, during the testing stage. However, in the case of the M5T model, the RMSE, MAE, and R2 values ranged from 5.74–5.94%, 5.04–2.12%, and 0.791–0.796, respectively, during the training stage whereas the values ranged from 6.08–6.19%, 5.46–5.58%, and 0.762–0.783, respectively, during the testing stage of M5T model. Both the models showed poor performance such as (R2<0.50) in the case of S1, S2, and S3 input combinations at all stations. Moreover, it was observed that both the models performed better in training as compared to the testing stage. Both the models outperformed at Ziarat as compared to other stations. Overall, the MARS model performed better than M5T at all stations. The current study is important and it will provide a baseline for future studies to predict the other meteorological variables such as temperature, wind speed, solar radiation, and evapotranspiration by using machine learning tools in high altitude and remote basins which face the issue of data scarcity.

Advertisement

Acknowledgments

This study is supported by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP, Grant No. 2019QZKK0208), and the Research Fund for Introducing Talents of Yunnan University (No. YJRC3201702). We are thankful to the Surface Water Hydrology Project of the Water and Power Development Authority (SWHP-WAPDA) of Pakistan for providing the required meteorological data to conduct this study.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Khatibi R, Naghipour L, Ghorbani MA, Aalami MT. Predictability of relative humidity by two artificial intelligence techniques using noisy data from two Californian gauging stations. Neural Computing and Applications. 2013; 23(7):2241-52
  2. 2. Ghadiri M, Marjani A, Mohammadinia S, Shokri M. Machine Learning Approaches for Accurate Prediction of Relative Humidity based on Temperature and Wet-Bulb Depression. 2020
  3. 3. Li Z, Zou H, Qi B. Application of ARIMA and LSTM in Relative Humidity Prediction. In 2019 IEEE 19th International Conference on Communication Technology (ICCT). 2019; 16:1544-1549
  4. 4. Azzouni A, Pujolle G. A long short-term memory recurrent neural network framework for network traffic matrix prediction. arXiv preprint. 2017; arXiv: 1705.05690
  5. 5. Li H, Yang Y, Cheng Y, Jin Y, Luo H, Zhang L. Application of Time Series Model in Relative Humidity Prediction. In Journal of Physics: Conference Series. 2020; 1584 (1):012017
  6. 6. Gangishetty MK, Scott RW, Kelly TL. Effect of relative humidity on crystal growth, device performance and hysteresis in planar heterojunction perovskite solar cells. Nanoscale. 2016; 8(12):6300-7
  7. 7. Yu X. Indication of relative humidity of ECMWF in precipitation forecast in Hainan Prefecture. Qinghai Meteorology. 2009; 3:17-20
  8. 8. Hutapea MI, Pratiwi YY, Sarkis IM, Jaya IK, Sinambela M. Prediction of relative humidity based on long short-term memory network. In AIP Conference Proceedings. 2020. 2221 (1): 060003
  9. 9. Quansah E, Amekudzi LK, Preko K. The influence of temperature and relative humidity on indoor ozone concentrations during the Harmattan. Journal of Emerging Trends in Engineering and Applied Sciences. 2012; 3(5):863-7
  10. 10. Ohno H, Ohata T, Higuchi K. The influence of humidity on the ablation of continental-type glaciers. Annals of Glaciology. 1992; 16:107-14
  11. 11. Hastenrath S. Recession of tropical glaciers. Science. 1994; 265(5180):1790-1
  12. 12. Kaser G, Hardy DR, Mölg T, Bradley RS, Hyera TM. Modern glacier retreat on Kilimanjaro as evidence of climate change: observations and facts. International Journal of Climatology: A Journal of the Royal Meteorological Society. 2004; 24(3):329-39
  13. 13. Mölg T, Hardy DR. Ablation and associated energy balance of a horizontal glacier surface on Kilimanjaro. Journal of Geophysical Research: Atmospheres. 2004; 109(D16)
  14. 14. Rupper S, Roe G. Glacier changes and regional climate: A mass and energy balance approach. Journal of Climate. 2008; 21(20):5384-401
  15. 15. Farhat N. Effect of relative humidity on evaporation rates in Nabatieh region. Lebanese Science Journal. 2018; 19(1):59
  16. 16. Molano-Jimenez A, Orjuela-Cañón AD, Acosta-Burbano W. Temperature and Relative Humidity Prediction in Swine Livestock Buildings. In 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI). 2018; 1-4
  17. 17. Shi X, Lu W, Zhao Y, Qin P. Prediction of indoor temperature and relative humidity based on cloud database by using an improved BP neural network in Chongqing. IEEE Access. 2018; 6:30559-66
  18. 18. Singh VK, Tiwari KN. Prediction of greenhouse micro-climate using artificial neural network. Applied Ecology and Environmental Research. 2017; 15(1):767-78
  19. 19. Gunawardhana LN, Al-Rawas GA, Kazama S. An alternative method for predicting relative humidity for climate change studies. Meteorological Applications. 2017; 24(4):551-9
  20. 20. Bakar SZ, Ghazali RB, Ismail LH. Implementation of modified cuckoo search algorithm on functional link neural network for temperature and relative humidity prediction. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). 2014; 151-158
  21. 21. Martínez-Martínez V, Baladrón C, Gomez-Gil J, Ruiz-Ruiz G, Navas-Gracia LM, Aguiar JM, Carro B. Temperature and relative humidity estimation and prediction in the tobacco drying process using artificial neural networks. Sensors; 12(10):14004-21
  22. 22. Özbalta TG, Sezer A, Yıldız Y. Models for prediction of daily mean indoor temperature and relative humidity: education building in Izmir, Turkey. Indoor and Built Environment. 2012; 21(6):772-81
  23. 23. Chou SM, Lee TS, Shao YE, Chen IF. Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications. 200; 27(1):133-42
  24. 24. Adnan RM, Petroselli A, Heddam S, Santos CA, Kisi O. Short term rainfall-runoff modelling using several machine learning methods and a conceptual event-based model. Stochastic Environmental Research and Risk Assessment. 2021; 35(3):597-616
  25. 25. Singh KK, Pal M, Singh VP. Estimation of mean annual flood in Indian catchments using back propagation neural network and M5 model tree. Water Resources Management. 2010; 24(10):2007-19
  26. 26. Adnan RM, Liang Z, Trajkovic S, Zounemat-Kermani M, Li B, Kisi O. Daily streamflow prediction using optimally pruned extreme learning machine. Journal of Hydrology. 2019; 577:123981
  27. 27. Sharda VN, Prasher SO, Patel RM, Ojasvi PR, Prakash C. Performance of Multivariate Adaptive Regression Splines (MARS) in predicting runoff in mid-Himalayan micro-watersheds with limited data/Performances de régressions par splines multiples et adaptives (MARS) pour la prévision d'écoulement au sein de micro-bassins versants Himalayens d'altitudes intermédiaires avec peu de données. Hydrological Sciences Journal. 2008; 53(6):1165-75
  28. 28. Adamowski J, Chan HF, Prasher SO, Sharda VN. Comparison of multivariate adaptive regression splines with coupled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro-watersheds with limited data. Journal of Hydroinformatics. 2012; 14(3):731-44
  29. 29. Štravs L, Brilly M. Development of a low-flow forecasting model using the M5 machine learning method. Hydrological Sciences Journal. 2007; 52(3):466-77
  30. 30. Sattari MT, Pal M, Apaydin H, Ozturk F. M5 model tree application in daily river flow forecasting in Sohu Stream, Turkey. Water Resources. 2013; 40(3):233-42
  31. 31. Yaseen ZM, Kisi O, Demir V. Enhancing long-term streamflow forecasting and predicting using periodicity data component: application of artificial intelligence. Water Resources Management. 2016(b); 30(12):4125-51
  32. 32. Yin Z, Feng Q, Wen X, Deo RC, Yang L, Si J, He Z. Design and evaluation of SVR, MARS and M5Tree models for 1, 2 and 3-day lead time forecasting of river flow data in a semiarid mountainous catchment. Stochastic Environmental Research and Risk Assessment. 2018; 32(9):2457-76
  33. 33. Nourani V, Davanlou Tajbakhsh A, Molajou A, Gokcekus H. Hybrid wavelet-M5 model tree for rainfall-runoff modeling. Journal of Hydrologic Engineering. 2019; 24(5):04019012
  34. 34. Al-Sudani ZA, Salih SQ, Yaseen ZM. Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation. Journal of Hydrology. 2019; 573:1-2
  35. 35. Mehdizadeh S, Fathian F, Safari MJ, Adamowski JF. Comparative assessment of time series and artificial intelligence models to estimate monthly streamflow: a local and external data analysis approach. Journal of Hydrology. 2019; 579:124225
  36. 36. Adnan RM, Liang Z, Heddam S, Zounemat-Kermani M, Kisi O, Li B. Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. Journal of Hydrology. 2020; 586:124371
  37. 37. Fathian F, Mehdizadeh S, Sales AK, Safari MJ. Hybrid models to improve the monthly river flow prediction: Integrating artificial intelligence and non-linear time series models. Journal of Hydrology. 2019; 575:1200-13
  38. 38. Kisi O, Heddam S. Evaporation modelling by heuristic regression approaches using only temperature data. Hydrological Sciences Journal. 2019; 64(6):653-72
  39. 39. Kaya YZ, Mamak M, Üneş F, Demirci M. Evapotranspiration prediction using M5T method and Ritchie equation for St. Johns, FL, USA. 2017
  40. 40. Alipour A, Yarahmadi J, Mahdavi M. Comparative study of M5 model tree and artificial neural network in estimating reference evapotranspiration using MODIS products. Journal of Climatology. 2014; 2014
  41. 41. Shrestha S, Nepal S. Water balance assessment under different glacier coverage scenarios in the Hunza Basin. Water. 2019; 11(6):1124
  42. 42. Saifullah M, Liu S, Adnan M, Ashraf M, Zaman M, Hashim S, Muhammad S. Risks of Glaciers Lakes Outburst Flood along China Pakistan Economic Corridor. In Glaciers and Polar Environment. 2020. IntechOpen
  43. 43. Hewitt K. Glacier change, concentration, and elevation effects in the Karakoram Himalaya, Upper Indus Basin. Mountain Research and Development. 2011; 31(3):188-200
  44. 44. Ali AF, Zhang XP, Adnan M, Iqbal M, Khan G. Projection of future streamflow of the Hunza River Basin, Karakoram Range (Pakistan) using HBV hydrological model. Journal of Mountain Science. 2018; 15(10):2218-35
  45. 45. Tahir AA, Chevallier P, Arnaud Y, Ahmad B. Snow cover dynamics and hydrological regime of the Hunza River basin, Karakoram Range, Northern Pakistan. Hydrology and Earth System Sciences. 2011; 15(7):2275-90
  46. 46. Shrestha M, Koike T, Hirabayashi Y, Xue Y, Wang L, Rasul G, Ahmad B. Integrated simulation of snow and glacier melt in water and energy balance based, distributed hydrological modeling framework at Hunza River Basin of Pakistan Karakoram region. Journal of Geophysical Research: Atmospheres. 2015; 120(10):4889-919
  47. 47. Quinlan JR. Learning with continuous classes. In 5th Australian joint conference on artificial intelligence. 1992; 92:343-348
  48. 48. Witten IH, Frank E, Hall MA, Pal CJ. Practical machine learning tools and techniques. Morgan Kaufmann. 2005:578
  49. 49. Rahimikhoob A, Asadi M, Mashal M. A comparison between conventional and M5 model tree methods for converting pan evaporation to reference evapotranspiration for semi-arid region. Water Resources Management. 2013; 27(14):4815-26
  50. 50. Adnan RM, Petroselli A, Heddam S, Santos CA, Kisi O. Comparison of different methodologies for rainfall–runoff modeling: machine learning vs. conceptual approach. Natural Hazards. 2021; 105(3):2987-3011
  51. 51. Adnan RM, Liang Z, Yuan X, Kisi O, Akhlaq M, Li B. Comparison of LSSVR, M5RT, NF-GP, and NF-SC models for predictions of hourly wind speed and wind power based on cross-validation. Energies. 2019; 12(2):329
  52. 52. Pal M, Deswal S. M5 model tree based modelling of reference evapotranspiration. Hydrological Processes: An International Journal. 2009; 23(10):1437-43
  53. 53. Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics. 1991:1-67
  54. 54. Zhang W, Goh AT. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geoscience Frontiers. 2016; 7(1):45-52
  55. 55. Kisi O, Parmar KS. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. Journal of Hydrology. 2016; 534:104-12
  56. 56. Liu Z, Zhou P, Chen G, Guo L. Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. Journal of Hydrology. 2014; 519:2822-31
  57. 57. Yuan X, Wu X, Tian H, Yuan Y, Adnan RM. Parameter identification of nonlinear Muskingum model with backtracking search algorithm. Water Resources Management. 2016; 30(8):2767-83
  58. 58. Meng E, Huang S, Huang Q, Fang W, Wu L, Wang L. A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. Journal of Hydrology. 2019; 568:462-78
  59. 59. Zhou Y, Guo S, Chang FJ. Explore an evolutionary recurrent ANFIS for modelling multi-step-ahead flood forecasts. Journal of Hydrology. 2019; 570:343-55
  60. 60. Alizamir M, Kisi O, Muhammad Adnan R, Kuriqi A. Modelling reference evapotranspiration by combining neuro-fuzzy and evolutionary strategies. Acta Geophysica. 2020; 68:1113-26
  61. 61. Yuan X, Chen C, Lei X, Yuan Y, Adnan RM. Monthly runoff forecasting based on LSTM–ALO model. Stochastic environmental research and risk assessment. 2018; 32:2199-212

Written By

Muhammad Adnan, Rana Muhammad Adnan, Shiyin Liu, Muhammad Saifullah, Yasir Latif and Mudassar Iqbal

Submitted: 02 March 2021 Reviewed: 03 May 2021 Published: 03 June 2021