Station locations and period of records.

## Abstract

Reference evapotranspiration (ETo) is an important and one of the most difficult components of the hydrologic cycle to quantify accurately. Estimation/measurement of ETo is not simple as there are number of climatic parameters that can affect the process. There exists copious conventional (direct and indirect) and non conventional/soft computing (artificial neural networks, ANNs) methods for estimating ETo. Direct methods have the limitations of measurement errors, expensive, impracticality of acquiring point measurements for spatially variable locations, whereas the indirect methods have the limitations of unavailability of all necessary climate data and lack of generalizability (needs local calibration). In contrast to conventional methods, soft computing models can estimate ETo accurately with minimum climate data which have advantages over limitations of conventional ETo methods. This chapter reviews the application of ANN methods in estimating ETo accurately for 15 locations in India using six climatic variables as input. The performance of ANN models were compared with the multiple linear regression (MLR) models in terms of root mean squared error, coefficient of determination and ratio of average output and target ETo values. The results suggested that the ANN models performed better as compared to MLR for all locations.

### Keywords

- evapotranspiration
- ANN
- climate
- data
- Gaussian
- lysimeter

## 1. Introduction

Evapotranspiration (ET) is the combining process of evaporation and transpiration losses. Almost 62% of precipitation falls on continents are returned back to the atmosphere through the ET process [1]. ET plays a significant role in the hydrological cycle and its estimation is very important in various fields of water resources. A common procedure for estimating actual crop evapotranspiration (ET_{crop}) is to first estimate reference evapotranspiration (ET_{o}) and to then apply an appropriate crop coefficient (k_{c}). ET_{o} is an important and one of the most difficult components of the hydrologic cycle to quantify accurately. ET_{o} is measured from a hypothetic crop of uniform height (12 cm), active growing (crop resistance of 70 s m^{−1}), completely shading the ground (albedo of 0.23) and unlimited supply of water [2]. The Food and Agricultural Organization (FAO) consider the above definition as standard and sole method for estimating ET_{o} if sufficient climatic data are available [3, 4].

Estimation of ET_{o} is complex due to influence of various climatic variables (maximum and minimum air temperature, wind speed, solar radiation and maximum and minimum relative humidity) and existence of nonlinearity in between climatic data and ET_{o.} Though users have number of methods for measuring or estimating ET_{o} directly or indirectly, most of them have some limitations regarding data availability or regional applicability. In addition, in order to use these methods, users are required to make reasonable estimates for some of the parameters in the employed ET_{o} models, which involve some uncertainties and might not result in reliable ET_{o} estimates [5]. Further, it is difficult to develop accurately representative physically based models for the complex non-linear hydrological processes, such as ET_{o}. This is because the physical relationships involving in a system can be too complicated to be accurately represented in a physically based manner.

The above limitations lead to the need of developing some techniques that can accurately estimate ET_{o} values with a minimum input data and are also easy to apply without knowing physical process inside the system. Artificial neural network (ANN) techniques, which can provide a model to predict and investigate the process without having a complete understanding of it, can be a useful tool for the above purpose. These techniques are also interesting because of its knowledge discovering property. In contrast to conventional methods, ANNs can estimate ET_{o} accurately with minimum climate data, which may have the advantages of being inexpensive, independent of specific climatic condition, ignored physical relations, and precise modeling of nonlinear complex system. In the last decade, many researchers have used ANN techniques for modeling of the ET_{o} process [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25].

## 2. Review of literature

This section follows the discussion of some of the significant contributions made by various researchers in the application of different ANN techniques for modeling ET_{o} or pan evaporation (E_{p}). A radial basis function (RBF) neural network was developed in C language to estimate daily soil water evaporation [26]. The input layer of network consists of average relative air humidity, air temperature, wind speed (Ws) and soil water content. They compared the results of RBF networks with the multiple linear regression (MLR) techniques. A feed-forward back propagation (FFBP)-based ANN model was developed to estimate daily E_{p} based on measured weather variables [27]. They used different input combinations to model E_{p}. They compared the developed ANN models with the Priestly-Taylor & MLR models. RBF neural network model was developed to estimate the FAO Blaney-Criddle *b* factor [28]. The input layer to RBF model consists of minimum daily relative humidity (RH_{min}), day time Ws and mean ratio of actual to possible sunshine hours (n/N). The *b* values estimated by the RBF models were compared to the appropriate *b* values produced using regression equations. FFBP ANN models were implemented for the estimation of daily ET_{o} using six basic climatic parameters as inputs [16]. They trained ANNs using three learning methods (with different learning rates and momentum coefficients), different number of processing elements in the hidden layers, and the number of hidden layers. The compared the results of developed ANN models with the Penman Monteith (PM) method and with a lysimeter measured ET_{o}. ANN-based back propagation models for estimating Class A E_{p} with minimum climate data (four input combinations) were developed and compared with the existing conventional methods [22].

The potential of RBF neural network for estimating the rice daily crop ET using limited climatic data was demonstrated [23]. Six RBF networks, each using varied input combinations of climatic variables were trained and tested. The model estimates were compared with measured lysimeter ET. A sequentially adaptive RBF network was applied for forecasting of monthly ET_{o} [29]. Sequential adaptation of parameters and structure was achieved using extended Kalman filter. Criterion for network growing was obtained from the Kalman filter’s consistency test. Criteria for neuron/connections pruning were based on the statistical parameter significance test. The results showed that the developed network was learned to forecast ET_{o},_{t + 1} (current or next month) based on ET_{o},_{t-11} (at a lag of 12 months) and ET_{o},_{t-23} (at a lag of 24 months) with high reliability. The study examined that whether it is possible to attain the reliable estimation of ET_{o} only on the basis of the temperature data [24]. He developed RBF neural network for estimating ET_{o} and compared the developed model with temperature-based empirical models.

ANN-based daily ET_{o} models were trained based on different categories of conventional ET_{o} estimation methods such as temperature based (FAO-24 Blaney-Criddle), radiation based (FAO-24 Radiation method for arid and Turc method for humid regions) and combinations of these (FAO-56 PM) [14]. A comparison of the Hargreaves and ANN methods for estimating monthly ET_{o} only on the basis of temperature data was done [19]. They developed ANN models with the data from six stations and tested these developed models with the data from remaining six stations, which were not used in model development. The capability of ANN for converting E_{p} to ET_{o} was studied using temperature data [18]. The conventional method that uses pan coefficient (K_{p}) as a factor to convert E_{p} to ET_{o} was considered for this comparison. Generalized ANN (GANN)-based ET_{o} models corresponding to FAO-56 PM, FAO-24 Radiation, Turc and FAO-24 Blaney-Criddle methods were developed [15]. These models were trained using the pooled data from four California Irrigation Management Information System (CIMIS) stations with FAO-56 PM computed values as targets. The developed GANN models were tested with different stations which were not used in training. Multilayer perceptron (MLP) neural networks for estimating the daily E_{p} using input data of maximum and minimum air temperature and the extraterrestrial radiation was developed [20]. The potential for the use of ANNs to estimate the ET_{o} based on air temperature data was examined [21]. He also conducted comparison of estimates provided by the ANNs and by Hargreaves equation by using the FAO-56 PM model as reference model.

## 3. Study area and data collected

For the purpose of this study, 15 meteorological stations in India were chosen. Figure 1 shows the geographical locations of these selected stations and their related agro-ecological regions (AERs). These stations are having daily meteorological data of from 2001 to 2005 of following variables: minimum air temperature (T_{min}), maximum air temperature (T_{max}), minimum relative humidity (RH_{min}), maximum relative humidity (RH_{max}), mean wind speed (w_{s}), and solar radiation (S_{ra}). Table 1 shows the details of 15 climatic stations of India along with altitude and duration of available data. The study area is bounded between the longitudes of 68° 7′ and 97° 25′ E and the latitudes of 8° 4′ and 37° 6’ N. The annual potential evapotranspiration of India is 1771 mm. It varies from a minimum of 1239 mm in Jammu and Kashmir to a maximum of 2100 mm in Gujarat [30].

AER | Location | Alt. (m) | Period | T_{max} (°C) | T_{min} (°C) | RH_{max} (%) | RH_{min} (%) | W_{s} (km h^{−1}) | S_{ra} (MJ m^{−2} day^{−1}) |
---|---|---|---|---|---|---|---|---|---|

Semi-arid | Parbhani | 423 | 2001–2005 | 33.75 | 18.32 | 71.13 | 41.02 | 5.04 | 20.87 |

Solapur | 25 | 2001–2005 | 34.15 | 20.14 | 73.28 | 45.09 | 6.15 | 18.96 | |

Bangalore | 930 | 2001–2005 | 28.90 | 17.70 | 89.15 | 47.30 | 8.68 | 18.95 | |

Kovilpatti | 90 | 2001–2005 | 35.11 | 23.37 | 80.36 | 48.52 | 6.60 | 19.30 | |

Udaipur | 433 | 2001–2005 | 31.81 | 16.33 | 72.36 | 36.44 | 3.74 | 19.45 | |

Arid | Anantapur | 350 | 2001–2005 | 34.43 | 21.78 | 73.32 | 33.91 | 9.64 | 20.27 |

Hissar | 215 | 2001–2005 | 31.17 | 16.23 | 81.00 | 44.27 | 5.20 | 17.26 | |

Sub-humid | Raipur | 298 | 2001–2005 | 32.60 | 19.91 | 80.62 | 44.08 | 5.33 | 17.80 |

Faizabad | 133 | 2001–2005 | 31.56 | 18.18 | 87.02 | 52.11 | 3.51 | 17.88 | |

Ludhiana | 247 | 2001–2005 | 30.06 | 17.42 | 83.97 | 49.14 | 4.26 | 18.10 | |

Ranichauri | 1600 | 2001–2005 | 20.08 | 9.66 | 81.15 | 61.55 | 4.99 | 16.23 | |

Humid | Palampur | 1291 | 2001–2005 | 24.41 | 13.24 | 69.70 | 57.88 | 5.56 | 16.35 |

Jorhat | 86 | 2001–2005 | 27.97 | 19.23 | 92.70 | 75.27 | 3.00 | 14.68 | |

Mohanpur | 10 | 2001–2005 | 32.20 | 21.04 | 96.18 | 61.48 | 1.27 | 18.06 | |

Dapoli | 250 | 2001–2005 | 31.13 | 18.87 | 93.77 | 69.22 | 4.92 | 18.02 |

## 4. Theoretical consideration

The concept of neural networks was introduced by [31]. The neural-network approach, also referred to as ‘connectionism’ or ‘paralleled distributed processing’, adopts a “Brain metaphor” of information processing. Information processing in a neural network occurs through interactions involving large number of simulated neurons. A neural network (NN) is a simplified model of the human brain nervous system consisting of interconnected neurons in a parallel distributed system, which can learn and memorize the information. In NN, the interneuron connection strengths, known as ‘synaptic weights’ are used to store the acquired knowledge [32]. In other words, ANN discovers the relationship between a set of inputs and desired outputs without giving any information about the actual processes involved; it is in essence based on pattern recognition. ANNs consist of a number of interconnected processing elements or neurons. How the inter-neuron connections are arranged determines the topology of a network. A neuron is the fundamental unit of human brain’s nervous system that receives and combines signals from other neurons through input paths called ‘dendrites’. Each signal coming to a neuron along a dendrite passes through a junction called ‘synapse’, which is filled with neuro-transmitter fluid that produce electrical signals to reach to the soma or cell body where processing of the signals occurs [16]. If the combined input signal after processing is stronger than the threshold value, the neuron activates, producing an output signal, which is transferred through the axon to the other neurons. Similarly, ANN consists of a large number of simple processing units called neurons (or nodes) linked by weighted connections. A comprehensive description of neural networks was presented in a series of papers [33, 34, 35], which provide valuable information for the researchers.

### 4.1 Model of a neuron

The main function of artificial neuron is to generate output from an activated nonlinear function with the weighted sum of all inputs. Figure 2 illustrates a nonlinear model of a neuron, which forms the basis for designing ANN. The input layer neurons receive the input signals (x_{i}) and these signals are passed to the cell body through the synapses. A set of synapses or connecting links is characterized by its own weight or strength. A signal at the input of synapse ‘i’ connected to neuron ‘k’ is multiplied by the synaptic weight ‘w_{ki}’. The input signals, weighted by the respective synapses of the neuron are added by a linear combiner. An activation function or squashing function is used for limiting the permissible amplitude range of the output of a neuron to some finite value. An external bias (b_{k}) has an effect of increasing or decreasing the net input of the activation function depending on whether it is positive or negative, respectively.

In the mathematical form, a neuron k may be described by the following equations:

where x_{1}, x_{2}, x_{3,} ……….. x_{n} = input signals; w_{k1},w_{k2}, …….w_{kn} = synaptic weights of neuron k; u_{k} = linear combiner output due to the input signal; b_{k} = bias; φ(.) = activation function; y_{k} = output signal of the neuron k.

Let *v*_{k} be the induced local field or activation potential, which is given as:

Now, Eqs. (1), (2) and (3) can be written as:

In Eq. (5), a new synapse with input x_{0} = +1 is added and its weight is w_{k0} = b_{k} to consider the effect of the bias.

### 4.2 Neural network architecture parameters

Determination of appropriate neural network architecture is one of the most important tasks in model-building process. Various types of neural networks are analyzed to find the most appropriate architecture of a particular problem. Multilayer feed forward networks are found to outperform all the others. Although multilayer feed forward networks are one of the most fundamental models, they are the most popular type of ANN structure suited for practical applications.

### 4.3 Number of hidden layers

There is no fixed rule for selection of hidden layers of a network. Therefore, trial and error method was used for selection of number of hidden layers. Even one hidden layer of neuron (operating sigmoid activation function) can also be sufficient to model any solution surface of practical interest [36].

### 4.4 Number of hidden neurons

The ability of the ANN to generalize data not included in training depends on selection of sufficient number of hidden neurons to provide a means for storing higher order relationships necessary for adequately abstracting the process. There is no direct and precise way of determining the most appropriate number of neurons to include in hidden layer and this problem becomes more complicated as number of hidden layer increases. Some studies indicated that more number of neurons in hidden layer provide a solution surface that closely fit to training patterns. But in practice, more number of hidden neurons results the solution surface that deviate significantly from the trend of the surface at intermediate points or provide too literal interpretation of the training points which is called ‘over fitting’. Further, large number of hidden neurons reduces the speed of operation of network during training and testing. However, few hidden neurons results inaccurate model and provide a solution surface that deviates from training patterns. Therefore, choosing optimum number of hidden neurons is one of the important training parameter in ANN. To solve this problem, several neural networks with different number of hidden neurons are used for calibration/training and one with best performance together with compact structure is accepted.

### 4.5 Types of activation functions

The activation function or transfer function, denoted by φ(*v*), defines the output of a neuron in terms of the induced local field *v*. It is valuable in ANN applications as it introduces a degree of nonlinearity between inputs and outputs. Logistic sigmoid, hyperbolic tangent and linear functions are some widely used transfer function in ANN modeling.

*Logistic sigmoid function:* This function is a continuous function that reduces the output into the range of 0–1 and is defined as [32]:

*Hyperbolic tangent function*: It is used when the desired range of output of a neuron is between −1 and 1 and is expressed as [32]:

*Linear function*: It calculates the neuron’s output by simply returning the value passed to it. It can be expressed as:

### 4.6 Neural network architectures

The manner in which the neurons of a neural network are structured is intimately linked with the learning algorithm used to train the network. This leads to the formation of network architectures. The neural network architectures are classified into distinct classes depending upon the information flow. The different network architectures are: (a) multilayer perceptions, (b) recurrent, (c) RBF, (d) Kohonen self-organizing feature map, etc.

### 4.7 Multilayer perceptions (MLPs)

MLPs are layered (single-layered or multi-layered) feed forward networks typically trained with static back-propagation (Figure 3). Therefore, it is also called as FFBP neural networks. These networks have found their way into countless applications requiring static pattern classification. This architecture consists of input layers, output layer(s) and one or more hidden layers. The input signal moves in only forward direction from the input nodes to the output nodes through the hidden nodes. The function of hidden layer is to perform intermediate computations in between input and output layers through weights. The major advantage of FFBP is that they are easy to handle and can easily approximate any input-output map [37].

### 4.8 Recurrent neural networks (RNN)

RNN may be fully recurrent networks (FRN) or partially recurrent networks (PRN). FNN sent the outputs of the hidden layer back to itself, whereas PRN initiates the fully RNN and add a feed-forward connection (Figure 3). A simple RNN could be constructed by a modification of the multilayered feed-forward network with the addition of a ‘context layer’. At first epoch, the new inputs are sent to the RNN and previous contents from the hidden layer are passed to context layer and at next epoch, the information is fed back to the hidden layer. Similarly, weights are calculated hidden to context and vice versa. The RNN can have an infinite memory depth and thus find relationship through time as well as through the instantaneous input space. Recurrent networks are the state-of-the-art in nonlinear time series prediction, system identification, and temporal pattern classification [37, 38, 39].

### 4.9 Radial basis function (RBF) networks

RBF is a three-layer feed-forward network that consists of nonlinear Gaussian transfer function in between input and hidden layers and linear transfer function in between hidden and output layers (Figure 3). The requirement of hidden neurons for the RBF network is more as compared to standard FFBP, but these networks tend to learn much faster than MLPs [37]. The most common basis function used is Gauss function and is given by:

where *Ri* = basis or Gauss function; *c* = cluster center; *σij* = width of the Gaussian function. The centers and widths of the Gaussians are set by unsupervised learning rules, and supervised learning is applied to the output layer. After the center is determined, the connection weights between the hidden layer and output layer can be determined simply through ordinary back-propagation (gradient-descent) training. The output layer performs a simple weighted sum with a linear output and the weights of the hidden layer basis units (input to hidden layer) are set using some clustering techniques.

where *wi* = connection weight between the hidden neuron and output neuron; *wo* = bias; x_{i} = input vector.

### 4.10 ANN learning paradigms

Broadly speaking, there are two types of learning process namely, supervised and unsupervised. In supervised learning, the network is presented with examples of known input-output data pairs, after which it starts to mimic the presented input output behavior or pattern. In unsupervised learning, the network learns on their own, in a kind of self-study without teacher.

*Supervised learning*: It is also called ‘associative learning’ involves a mechanism of providing the network with a set of inputs and desired outputs. It is like learning with the help of a teacher. The so-called teacher has the knowledge of the environment and the knowledge is represented by a set of input-output examples. The environment is, however, unknown to the neural network. The network parameters (i.e., synaptic weights and error) are adjusted iteratively in a step-by-step fashion under the combined influence of the training vector and the error signal. After the completion of training, the neural network is able to deal with the environment completely by itself [32]. In supervised learning, FFBP NN is the most popular ones. In the FFBP NNs, neurons are organized into layers where information is passed from the input layer to the final output layer in a unidirectional manner. Any network in ANN consists of ‘neurons or nodes or parallel processing elements’ which interconnects the each layer with weights (W). A three layer (input (i), hidden (j) and target/output (k)) FFBP NN with weights W_{ij} and W_{jk} is shown in Figure 4. During training the FFBP NN, the initial or randomized weight values are corrected or adjusted as per calculated error in between output and target values and back-propagates these errors (from right to left in Figure 4) un till minimum error criteria achieved.

*Unsupervised learning:* Network is provided with inputs but not with desired outputs. The system itself must then decide what features it will use to group the input data. This is often referred to as self-organization or adaption. Provision is made for a task-independent measure of the quality of representation that the network is required to learn and the free parameters of the network are optimized with respect to that measure [32]. The most widely used unsupervised neural network is the Kohonen self-organizing map, KSOM.

### 4.11 Kohonen self-organizing map (KSOM)

KSOM maps the input data into two-dimensional discrete output map by clustering similar patterns. It consists of two interconnected layers namely, multi-dimensional input layer and competitive output layer with ‘w’ neurons (Figure 5).

Each node or neuron ‘i’ (i = 1, 2, … w) is represented by an n-dimensional weight or reference vector w_{i} = [w_{i1},….,w_{in}]. The ‘w’ nodes can be ordered so that similar neurons are located together and dissimilar neurons are remotely located on the map. The topology of network is indicated by the number of output neurons and their interconnections. The general network topology of KSOM is either a rectangular or a hexagonal grid. The number of neurons (map size), w, may vary from a few dozen up to several thousands, which affects accuracy and generalization capability of the KSOM. The optimum number of neurons (w) can be determined by below equation [41].

where N = total number of data samples or records. Once ‘w’ is known, the number of rows and columns in the KSOM can be determined as:

where l_{1} and l_{2} = number of rows and columns, respectively; e_{1} = biggest eigen value of the training data set; e_{2} = second biggest eigen value.

### 4.12 Training the KSOM

The KSOM is trained iteratively: initially the weights are randomly assigned. When the n-dimensional input vector x is sent through the network, the Euclidean distance between weight ‘w’ neurons of SOM and the input is computed by,

where x_{i} = ith data sample or vector; w_{i} = prototype vector for x_{i};│denotes Euclidian distance.

The best matching unit (BMU) is also called as ‘winning neuron’ is the weight that closely matching to the input. The learning process takes place in between BMU and its neighboring neurons at each training iteration ‘t’ with an aim to reduce the distance between weights and input.

where α = learning rate; *l* and m = positions of the winning neuron and its neighboring output nodes; *hlm* = neighborhood function of the BMU *l* at iteration *t*.

The most commonly used neighborhood function is the Gaussian which is expressed as:

where *l-m* = distance between neurons *l* and *m* on the map grid; σ = width of the topological neighborhood.

The training steps are repeated until convergence. After the KSOM network is constructed, the homogeneous regions, that is, clusters are defined on the map. The KSOM trained network performance is evaluated using two errors namely, total topographic error (t_{e}) and quantization error (q_{e}).

The topographic error, t_{e}, is an indication of the degree of preservation of the topology of the data when fitting the map to the original data set.

where u(x_{i}*)* = binary integer such that it is equal to 1 if the first and second best matching units of the map are not adjacent units; otherwise it is zero.

The quantization error, q_{e}, is an indication of the average distance between each data vector and its BMU at convergence, that is, the quality of the map fitting to the data.

where w_{li} = prototype vector of the best matching unit for x_{i}.

### 4.13 Type of ANN training algorithms

Training basically involves feeding training samples as input vectors through a neural network, calculating the error of the output layer, and then adjusting the weights of the network to minimize the error. There are different methods for adjusting the weights. These methods are called as “training algorithms”. The objective of the training algorithm is to minimize the difference between the predicted output values and the measured output values [6]. Different training algorithms are: (i) gradient descent with momentum backpropagation (GDM) algorithm, (ii) Levenberg-Marquardt (LM) algorithm, (iii) Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi Newton algorithm, (iii) resilient back propagation (RBP) algorithm, (iv) conjugate gradient algorithm, (v) one-step secant (OSS) algorithm, (vi) cascade correlation (CC) algorithm, and (vii) Bayesian regularization (BR) algorithm. The training algorithms used in this study are only briefly described below.

### 4.14 Gradient descent with momentum back propagation (GDM) algorithm

This method uses back-propagation to calculate derivatives of performance cost function with respect to the weight and bias variables of the network. Each variable is adjusted according to the gradient descent with momentum. The equation used for update of weight and bias is given by:

where Δ*wji*(*n*) = correction applied to the synaptic weight connecting neuron i to neuron j; α = momentum; *η* = learning-rate parameter; E = error function. The equation is known as the generalized delta rule and this is probably the simplest and most common way to train a network [37].

### 4.15 Levenberg-Marquardt (LM) algorithm

This method is a modification of the classic Newton algorithm for finding an optimum solution to a minimization problem. In particular the LM utilizes the so called Gauss-Newton approximation that keeps the Jacobian matrix and discards second order derivatives of the error. The LM algorithm interpolates between the Gauss-Newton algorithm and the method of gradient descent. To update weights, the LM algorithm uses an approximation of the Hessian matrix.

where W = weight; e = errors; I = identity matrix; λ = learning parameter; J = Jacobian matrix (first derivatives of errors with respect to the weights and biases);

### 4.16 Online and batch modes of training

On-Line learning updates the weights after the presentation of each exemplar. In contrast, Batch learning updates the weights after the presentation of the entire training set. When the training datasets are highly redundant, the online mode is able to take the advantage of this redundancy and provides effective solutions to large and difficult problems. On the other hand, the batch mode of training provides an accurate estimate of gradient vector; convergence of local minimum is thereby guaranteed under simple conditions [23].

### 4.17 Multiple linear regression (MLR)

MLR technique attempts to model the relationship between two or more explanatory (independent) variables and a response (dependent) variable by fitting a linear equation to the observed data. The general form of a MLR model is given as [42]:

where Y_{i} = ith observations of each of the dependent variable Y; X_{1, i,} X_{2,i,} ⋯, X_{k, i} = ith observations of each of the independent variables X_{1,} X_{2}, ⋯, X_{k} respectively; β_{0}, β_{1}, β_{2}, ⋯, β_{n} = fixed (but unknown) parameters; ε_{i} = random variable that is normally distributed.

The task of regression modeling is to estimate the unknown parameters (β_{0}, β_{1}, β_{2}, ⋯, β_{n}) of the MLR model [Eq. (20)]. Thus, the pragmatic form of the statistical regression model obtained after applying the least square method is as follows [42].

where _{i} = estimated error (or residual) for the ith observation.

Therefore, estimate of

The difference between the observed Y and the estimated

The purpose of developing MLR models is to establish a simple equation which is easy to use and interpret. The MLR modeling is very useful, especially in case of limited field data. Moreover, it is versatile as it can accommodate any number of independent variables [43].

### 4.18 The FAO-56 Penman-Monteith method

The FAO-56 PM method is recommended as the standard method for estimating ET_{o} in case of locations where measured lysimeter data is not available. The equation for the estimation of daily ET_{o} can be written as [3]:

where ET_{o} = reference evapotranspiration calculated by FAO-56 PM method (mm day^{−1}); *Rn* = daily net solar radiation (MJ m^{−2} day^{−1}); *γ* = psychrometric constant (kPa ^{o}C^{−1}); Δ = slope of saturation vapor pressure versus air temperature curve (kPa ^{o}C^{−1}); *es* and *ea* = saturation and actual vapor pressure (kPa), respectively; *T* = average daily air temperature (°C); *G* = soil heat flux (MJ m^{−2} day^{−1}); *Ws* = daily mean wind speed (m s^{−1}).

The ET_{o} values obtained from above equation are used as target data in ANN due to unavailability of lysimeter measured values.

## 5. Methodology

For the purpose of this study, 15 different climatic locations distributed over four agro-ecological regions (AERs) are selected. The selected locations are Parbhani, Kovilpatti, Bangalore, Solapur, Udaipur (semi-arid); Anantapur and Hissar (arid); Raipur, Faizabad, Ludhiana, and Ranichauri, (sub-humid); and Palampur, Jorhat, Mohanpur, and Dapoli (humid). Daily climate data of T_{min}, T_{max}, RH_{min}, RH_{max}, W_{s}, S_{ra} for the period of 5 years (January 1, 2001 to December 31, 2005) was collected from All India Coordinated Research Project on Agrometeorology (AICRPAM), Central Research Institute for Dryland Agriculture (CRIDA), Hyderabad, Telangana, India. These data were used for the development and testing of various ANN-based ET_{o} models. Due to the unavailability of lysimeter measured ET_{o} values for these stations, it is estimated by the FAO-56 PM method, which has been adopted as a standard equation for the computation of ET_{o} and calibrating other Eqs. [10]. The normalization technique was applied to both the input and target data before training and testing such that all data points lies in between 0 and 1. The normalization process removes the cyclicity of the data. The following procedure was adopted for normalizing the input and output data sets. Each variable, X_{i,} in the data set was normalized (X_{i, norm}) between 0 and 1 by dividing its value by the upper limit of the data set, X_{i, max}. Resulting data was then used for mapping.

ANN simulated ET_{o} was converted back to original form by denormalization procedure. The data from 2001 to 2005 was splitted into training (70% of 2001–2004), validation (30% of 2001–2004), and testing (2005) sets. ANN models were trained with the LM algorithm consists of one hidden layer (sigmoid transfer function) and one output layer (linear transfer function). The parameters that were fixed after a number of trials include: RMSE = 0.0001, learning rate = 0.65, momentum rate = 0.5, epochs = 500, and initial weight range = −0.5 to 0.5. The developed various ANN models were compared with basic statistical MLR models. The developed ANN models were evaluated and compared based on different error functions described in Table 2. Training window of the model contains general information used for training the networks like, error tolerance, Levenberg parameter (lambda) and maximum cycles of simulation. For weights selection, two options are there, weights can be randomized or it can be read from an existing weight file of previous training.

Evaluation criteria | Formulae |
---|---|

Root Mean Squared Error (RMSE) | |

Coefficient of determination (R^{2}) | |

Ratio of average output and target ET_{o} values (R) |

## 6. Results and discussion

### 6.1 Development of ANN models for daily ET_{o} estimation

ANN model with six climatic variables (T_{max}, T_{min}, RH_{max}, RH_{min},W_{S}, and S_{ra}) were trained and tested to evaluate the feasibility of ANN models corresponding to FAO-56 PM conventional ET_{o} method for 15 individual locations in India. In order to highlight the necessity of using complex ANN models, it is necessary to show the results obtained using MLR models.

### 6.2 Training of ANN models for daily ET_{o} estimation

All the ANN models were trained as per the procedure mentioned in methodology and after each training run; three performance indices (RMSE, R^{2}, and R_{ratio}) were calculated, to find the optimum neural network. Several runs were used for determining the optimal number of hidden neurons with different architectural configurations. The optimum neural network was selected based on criteria such that the model has minimum RMSE and maximum R^{2} values. Here, it is worth to mention that the R_{ratio} is used only to know whether the models overestimated or underestimated ET_{o} values. Training with higher number of hidden nodes might increase the performance of ANN models. But training with a several number of hidden nodes requires more computation time and cause complexity in architecture as it has to complete number of epochs [7]. Therefore, to avoid the above difficulty, the selection of an optimum node was fixed with a trial run of 1–15 hidden nodes only (i.e., not tried beyond 15 hidden nodes). Figure 6 shows the relationship between RMSE and number of hidden nodes of ANN models for four locations (Parbhani, Hissar, Faizabad, and Dapoli) during training. These locations are chosen randomly from each agro-ecological region such that Parbhani, Hissar, Faizabad, and Dapoli represent semi-arid, arid, sub-humid, and humid climates, respectively.

For ANN models, the best network was resulted at a hidden node of *i* + 1 (where *i* = number of nodes in the input layer) for most of the locations. Thus, *i* + 1 hidden nodes are sufficient to model the ET_{o} process using the ANN models [13, 14, 15, 16, 44, 45, 46]. Table 3 shows the performance statistics of ANN models for 15 locations during training. The results pertaining to the optimal network structure of ANN models, resulted at *i* + 1 hidden nodes, are only summarized in Table 3 for 15 locations.

AER | Location | ANN | ||
---|---|---|---|---|

RMSE | R^{2} | R_{ratio} | ||

Semi-arid | Parbhani | 0.141 | 0.991 | 0.997 |

Solapur | 0.271 | 0.969 | 1.000 | |

Bangalore | 0.296 | 0.972 | 1.005 | |

Kovilpatti | 0.254 | 0.991 | 1.000 | |

Udaipur | 0.391 | 0.952 | 1.003 | |

Arid | Anantapur | 0.363 | 0.972 | 0.986 |

Hissar | 0.052 | 0.999 | 1.000 | |

Sub-humid | Raipur | 0.255 | 0.981 | 0.982 |

Faizabad | 0.060 | 0.999 | 1.001 | |

Ludhiana | 0.289 | 0.977 | 0.999 | |

Ranichauri | 0.909 | 0.411 | 1.004 | |

Humid | Palampur | 0.177 | 0.988 | 0.999 |

Jorhat | 0.615 | 0.943 | 1.001 | |

Mohanpur | 0.377 | 0.904 | 1.002 | |

Dapoli | 0.150 | 0.990 | 1.000 |

### 6.3 FAO-56 PM-based ANN models

ET_{o} process is a function of various climatic factors (T_{max}, T_{min}, RH_{max}, RH_{min}, W_{S}, and S_{ra}). Therefore, it is pertinent to take into account the combined influence of all the climatic parameters on ET_{o} estimation. The ANN models corresponding to the FAO-56 PM were developed considering T_{max}, T_{min}, RH_{max}, RH_{min}, W_{s}, and S_{ra} as input and the FAO-56 PM ET_{o} as target. Table 4 shows the performance statistics of ANN and MLR models for 15 locations during testing. Comparison of results obtained using MLR and ANN models indicated that the ANN models performed better than the MLR models for all locations except for Bangalore. This is confirmed from the low values of RMSE (mm day^{−1}) and high values of R^{2} for ANN models as compared to the MLR models.

AER | Location | MLR | ANN | ||||
---|---|---|---|---|---|---|---|

RMSE | R^{2} | R_{ratio} | RMSE | R^{2} | R_{ratio} | ||

Semi-arid | Parbhani | 0.308 | 0.963 | 1.002 | 0.115 | 0.995 | 0.994 |

Solapur | 0.313 | 0.959 | 1.003 | 0.228 | 0.979 | 0.988 | |

Bangalore | 0.159 | 0.980 | 1.000 | 0.201 | 0.968 | 0.994 | |

Kovilpatti | 0.233 | 0.977 | 0.999 | 0.200 | 0.984 | 1.004 | |

Udaipur | 0.295 | 0.975 | 1.001 | 0.119 | 0.996 | 0.992 | |

Arid | Anantapur | 0.275 | 0.977 | 1.000 | 0.222 | 0.984 | 0.998 |

Hissar | 0.434 | 0.951 | 0.999 | 0.280 | 0.980 | 1.000 | |

Sub-humid | Raipur | 0.420 | 0.943 | 1.002 | 0.296 | 0.972 | 1.005 |

Faizabad | 0.357 | 0.957 | 1.002 | 0.286 | 0.973 | 1.011 | |

Ludhiana | 0.348 | 0.971 | 0.999 | 0.279 | 0.981 | 1.000 | |

Ranichauri | 0.265 | 0.961 | 0.999 | 0.137 | 0.989 | 1.005 | |

Humid | Palampur | 0.313 | 0.952 | 1.003 | 0.228 | 0.979 | 1.031 |

Jorhat | 0.151 | 0.978 | 1.000 | 0.137 | 0.985 | 1.019 | |

Mohanpur | 0.170 | 0.983 | 1.001 | 0.123 | 0.991 | 1.007 | |

Dapoli | 0.177 | 0.973 | 1.001 | 0.152 | 0.981 | 1.009 |

The R_{ratio} values of MLR models for 15 locations are nearly approaching one, which simply indicates that on an average these models neither over- nor under-estimated ET_{o}. However, high values of RMSE and R^{2} indicate that on a daily basis, these models over- and under-estimated ET_{o} values. Though the performance of ANN models was good as compared to MLR models, in some locations these models over- or under-estimated the ET_{o} values. The ANN models overestimated (R_{ratio} > 1) ET_{o} values at Palampur. The over- and under-estimations by all ANN models for the above locations were less than 3% which is negligible. The overall performance of all the models was represented as ANN > MLR for most of the locations except for Bangalore where, the performance of models was represented as MLR > ANN. The results suggest that the non-linearity of ET_{o} process can be adequately modeled using ANN models.

The scatter plots of the FAO-56 PM ET_{o} and ET_{o} estimated with the ANN models for 15 climatic locations in India are shown in Figure 7. The scatter plots confirm the statistics given in Table 4. Regression analysis was performed between the FAO-56 PM ET_{o} and ET_{o} estimated with the ANN and the best-fit lines are shown in Figure 7. The values of R^{2} for ANN models were found to be >0.968. The fit line equations (*y* = *a*_{0}*x* + *a*_{1}) in Figure 7 gave the values of *a0* and *a1* coefficients closer to one and zero, respectively. Due to the superior performance of ANN models over the MLR models, the time series plots of these models with 1 year data (during testing) for four selected locations Parbhani, Hissar, Faizabad, and Dapoli are shown in Figure 8. The location figures indicated that, ET_{o} estimated using ANN models matched well with the FAO-56 PM ET_{o} except for a few peak values in case of Faizabad.

## 7. Summary and conclusions

Evapotranspiration is an important and one of the most difficult components of the hydrologic cycle to quantify accurately. Prior to designing any irrigation system, the information on crop water requirements or crop evapotranspiration is needed, which can be calculated using reference evapotranspiration. There exist direct measurement methods (lysimeters) and indirect estimation procedures (physical and empirical based) for modeling ET_{o}. Direct methods have the limitations of arduous, cost-effective, and lack of skilled manpower to collect accurate measurements. The difficulty in estimating ET_{o} with the indirect physically based methods is due to the limitations of unavailability of all necessary climate data, whereas the application of empirical methods are limited due to unsuitability of these methods for all climatic conditions and need of local calibration. ANNs are efficient in modeling complex processes without formulating any mathematical relationships related to the physical process. This study was undertaken to develop ANN models corresponding to FAO-56 PM conventional ET_{o} method for 15 individual stations in India.

The potential of ANN models corresponding to the FAO-56 PM method was evaluated for 15 locations. The ANN models were developed considering six inputs (T_{max}, T_{min}, RH_{max}, RH_{min}, W_{s}, and S_{ra}) and the FAO-56 PM ET_{o} as target. The optimum number of hidden neurons was finalized with a trial of 1–15 hidden nodes. The ANN models gave lower RMSE values at *i* + 1 (*i* = number of inputs) hidden nodes for estimating ET_{o}. Comparison results of MLR and ANN models indicated that the ANN models performed better for all locations. However, on an average the over- and under-estimations of ET_{o} (<3% which is negligible) estimated by using MLR models was less as compared to ANN models. In brief, based on the above discussion on ET_{o} modeling, the following specific conclusions are drawn:

For estimating ET

_{o}using ANN model, a network of single hidden layer with*i*+ 1 (*i*= number of input nodes) number of hidden nodes was found as adequate.ANN-based ET

_{o}estimation models performed better than the MLR models for all locations.

However, it should be noted that only climate data from different agro-ecological regions of India was used in this analysis and the results might be different for various climates in other countries.