## 1. Introduction

Air pollution in urban environments has serious health and quality of life implications. A wide variety of anthropogenic air pollution sources increase the levels of background air pollutant concentrations, leading to the deterioration of the ambient air quality. Principal sources of urban air pollution are vehicular traffic, industrial activity and in general fossil fuel combustion, introducing a mixture of chemical components, particulate matter and biological material into the atmosphere. The deterioration of urban air quality is considered worldwide one of the primary environmental issues and current scientific evidence associate the exposure to ambient air pollution with a wide spectrum of health effects like cardiopulmonary diseases, respiratory related hospital admissions and premature mortality (Analitis et al. 2006; Ito et al., 2005; Samet et al., 2000). Direct measurements of sensitive population groups’ exposure to air pollution are scarce and therefore methods of accurate point and areal air quality estimations are prerequisite. This fact highlights the importance of generating accurate fields of air pollution for quantifying present and future health related risks.

In the field of air pollution modeling, two different approaches have been adopted by the scientific community, differentiated by their applied fundamental principles. The first approach involves the numerical simulation of atmospheric dispersion based on the current understanding of physics and chemistry that govern the transport, dispersion and transformation of pollutants in the atmosphere. The modeling process typically requires a set of parameters such as meteorological fields, terrain information along with a comprehensive description of pollution sources. An alternative approach is based on statistical analysis of pollutant concentrations collected from air quality monitoring networks commonly deployed in urban areas. The reasoning of the statistical approach is that physical processes are likely to induce correlations in air quality data collected over space and time. Statistical models generate predictions by exploiting these spatio-temporal patterns, enabling the estimation of pollutant concentrations in unmonitored locations.

The chapter’s main objective is to present and review the statistical spatial interpolation methodologies which are commonly employed in the field of air pollution modeling. An additional scope of the chapter is to compare and evaluate the accuracy of the interpolation methods for point estimations, using data from a real urban air quality monitoring network located at the greater area of metropolitan Athens in Greece.

## 2. Spatial interpolation methodologies

Spatial interpolation is the procedure of estimating the values of the variable under study at unsampled locations, using point observations within the same region. Statistical interpolation methodologies are applied in air pollution modeling for estimating the spatial distribution of pollutants, based on data provided from an existing air quality monitoring network. Such networks consist of a number of irregularly distributed stations and therefore the issue of air pollution spatial interpolation is essentially a problem in the field of scattered data approximation. The monitoring sites are typically located to detect high concentrations and although such a configuration is appropriate for identifying the maximum potential exposure and risk, in many cases it fails to describe the spatial variability of air pollution. Recent developments in the design and the modification of existing air quality monitoring networks address this issue (Lozano et al., 2009; Mofarrah & Husain, 2010; Nejadkoorki, 2011) and propose methodologies for an optimum configuration of monitoring sites used for more reliable and cost-effective spatial predictions.

The spatial interpolation methodologies addressed in this review consist of some of the most widely used interpolation schemes and can be classified according to their inherent general features. The following classifications and characteristics have been adopted in a number of spatial interpolation reviews (Eberly et al., 2004; Li & Heap 2008) as they serve the purpose of illustrating the suitability of each methodology for the spatial process of interest.

*Point – Areal*. Point based interpolation methodologies produce estimates in discrete unsampled locations based on point observations of the variable under study, whereas the areal based schemes estimations refer to a whole predefined area or zone.*Global – Local*. Global methodologies utilize all available data of the variable under study and develop a global estimation function for the study area, while local schemes segment the area and operate independently within a window, using only a fraction of the available data.*Exact – Approximate*. The methodologies which produce an estimation surface that essentially passes from the observed data are called exact interpolators, while approximate interpolators make use of the available data to produce the estimation surface, which is not required to predict exactly the observed values.*Stochastic – Deterministic*. Stochastic methodologies incorporate the concept of randomness providing therefore additionally the uncertainty associated with the prediction, while deterministic methods make no use of the probability theory.*Gradual – Abrupt*. Depending on the degree of smoothness of the resulting estimation surface the methodologies are classified as gradual or abrupt interpolators.

The reviewed methodologies include the nearest neighbor, the triangulated irregular network interpolators, the natural neighbor, which utilize the tessellation of the spatial field of interest, the inverse distance weighted methodologies, the radial basis function and thin plate spline schemes and the geostatistical Kriging set of interpolation methodologies. Additionally the artificial neural network approach to spatial interpolation is presented. The majority of the interpolation schemes are based on the weighted average approach, differentiating in the procedure used to calculate the interpolation weights (*w*
_{
i
}).

### 2.1. Nearest neighbor

The nearest neighbor scheme is the simplest methodology of multivariate interpolation and it is based on the Voronoi diagram (Dirichlet or Thiessen tessellation). Given a set of distinct points in the Euclidean plane, for constructing the Voronoi diagram, all locations in that space are associated with the closest member of the point set. The result is a tessellation of the plane into a set of the regions associated with members of the point set (Okabe et al., 2000). A sample Voronoi diagram is illustrated in Fig. 1, which corresponds to the partition of the area of study for the case of the air quality monitoring network in Athens, which is used in this work for the evaluation of the interpolation methodologies in air pollution modeling. The nearest neighbor scheme utilizes the Voronoi diagram and for point predictions all estimations within a region are set as equal to the nearest point or to the average its surrounding points.

The nearest neighbor methodology is selected for its computational simplicity and its main weakness is that each prediction is based on just one measurement and therefore information from other neighboring points is ignored (Webster & Oliver, 2007). Furthermore, it may be more suitable for data that are more discrete than continuous in nature (Eberly et al., 2004). These drawbacks limit the use of the nearest neighbor scheme in air pollution modeling which has been used in spatial interpolation comparative studies (Deligiorgi et al., 2010; Marshal et al., 2008; Shao et al., 2007) and for air quality data imputation (Junninen et al., 2002).

### 2.2. Triangulated irregular network

The triangulated irregular network (polynomial interpolation) is a group of methodologies that utilize the triangulation of the input space for estimating the values of the variable under study in unsampled locations. Delaunay triangulation is typically employed, which assigns triangles in the dataset by employing the criterion that no data point is positioned inside the circumcircle of any other triangle (Watson & Philip, 1984). Delaunay triangulation maximizes the minimum angle of all the triangle angles and thus it avoids extremely sharp and obtuse angles. The Voronoi tessellation is the geometrical dual of the Delaunay triangulation, obtained by perpendicularly bisecting the edges of the triangulation. In the case of the air quality monitoring network in Athens, both diagrams are presented in Fig. 2. Triangulated irregular network estimates for a target point within one of the formed triangles is performed by fitting a polynomial function to the values at the vertices of the formed triangles. The predictive surface is typically generated by using a least-squares regression fit that minimizes squared differences between the surface and measured points (Eberly at al., 2004).

### 2.3. Natural neighbor

The natural neighbor interpolation scheme is also based on the Voronoi tessellation and is a local interpolant, where the estimated function is a linear appropriately weighted average of the nearby data points. The selection of the nearby sites (natural neighbors) and the interpolation weights is based on a methodology proposed by R. Sibson (Sibson, 1981). Initially for a given set of irregular points S = {S_{1} … S_{n}}, the original Voronoi diagram is constructed (Fig. 1). Subsequently, the target interpolation point *q* is inserted in the diagram, altering the Voronoi tessellation, which for the case of the air monitoring network in Athens is illustrated in Fig. 3. The point *q* in the new Voronoi diagram is associated with the region *V(q)* and the interpolation weights *w*
_{
i
} are calculated from (Fan, 2005):

where *s*
_{
i
} the natural neighbors of *q* and *V(s*
_{
i
}
*)* their associated regions in the original Voronoi diagram. The natural and the nearest neighbor schemes belong to the polygonal interpolation group of methodologies and are local, deterministic and abrupt schemes.

### 2.4. Inverse distance weighting

The Inverse distance weighting class of interpolation schemes are developed by D. Shepard and are based on the assumption that the degree of influence of the nearby sample points should be greater than the effect of more distant points. The interpolant is also a weighted average of the sample point values and the weights *w*
_{
i
} are expressed as an inverse function of distance according to:

where *N*, the number of the sample points, *d*
_{
i
} the distance of the target point from each of the sample points and *p* the power parameter. The selection of *p* is arbitrary (typically equal to 2) and expresses the relative importance of the nearby points in relation to the distant points. As *p* becomes larger the weight assigned to points in large distances becomes smaller. Especially for *p*
0, the IDW estimate is equal to the spatial average of all sampled points, and when *p*
∞, the methodology is identical with the nearest neighbor scheme. The IDW schemes therefore can be global or local and are deterministic, exact and gradual interpolation methods. Although IDW methodology is one of the most widely used interpolation schemes, it has several limitations, mainly due to the fact that the spatial relationship between two locations is not simply a function of distance and in many cases this distance-decay relationship is not constant over space (Lu & Wong, 2008). In air pollution modeling the IDW method is a popular alternative to Kriging methodology in various scales. It is applied operationally by the Environmental Protection Agency (EPA) for generating real-time O_{3}, PM_{10} and Air Quality Index spatial predictions in nationwide scales (Eberly et al., 2004).

### 2.5. Radial Basis Functions

The use of Radial Basis Functions (RBF) as estimators for interpolating irregularly distributed data is introduced by R.L. Hardy (Hardy, 1971) and since then they have been widely used in various disciplines in multivariate scattered data interpolation. The method is based on the assumption that the estimated surface can be approximated to any degree of exactness by the summation of regular mathematically defined surfaces (Hardy, 1978). The RBF interpolation schemes are global, exact, deterministic interpolators. The RBF approximation function can be expressed by:

where the estimation function *y* is a linear aggregate of *N* radially symmetric functions *φ* for a given norm ║∙║, with Euclidean norm being the most common choice (Baxter, 1992). The data points are called the centers (*c*
_{
i
}) of the RBF interpolant and each *φ* is associated with one of these centers. The coefficients *w*
_{
i
} are chosen so that the interpolation conditions are satisfied and estimated by solving the system of linear equations (Rousos & Baxter, 2005):

where

is the interpolation matrix, *w* the vector of coefficients and *f* the vector of function values. Some common *φ* functions are the linear and cubic:

the Gaussian, the multiquadric and the inverse multiquadric::

The parameter *c*, in the Gaussian, the multiquadric and the inverse multiquadric RBFs is called the shape parameter and can be chosen arbitrarily, with a reasonable choice being in the region of the average euclidean distance between the centers. In many cases the quality of approximation is very sensitive to the shape parameter. The main advantage of the RBF interpolation schemes is that they require generally few point observations but relatively well distributed centers in order to create a good approximation of the studied process. The method in general doesn’t perform well for a process with abrupt changes in small distances and as an exact method, for processes associated with large observation errors.

### 2.6. Thin plate splines

Thin plate splines were initially introduced for interpolating meteorological fields (Wahba & Wendelberger, 1980) and can be considered as a special form of RBF based interpolation. The physical analog of the term ‘thin plate spline’, refers to the bending of a thin sheet of metal so it would pass through a predefined set of points. The Spline interpolation schemes in the case of scattered data interpolation is used to find the minimum curvature surface that passes through a set of irregularly spaced data points (Sandwell, 1987). The two fundamental conditions a thin plate spline satisfies are the exactness and smoothness (Myers, 1994):

where *D* is a second order differential operator and *R* the region of interest. An alternative restriction for a thin plate spline is the minimization of the quantity (Myers, 1994):

This expression represents a tradeoff between the exactness and smoothness of the solution. The interpolant obtained from above restriction no longer interpolates the noisy data, but smooths it, with the smoothing parameter *λ* controlling the tradeoff between exactness and smoothness (Wahba, 1990).

### 2.7. Kriging

Kriging interpolation schemes are the fundamental tools in the field of spatial statistics, developed by the founder of geostatistics G. Matheron and named after D.G. Krige who proposed the technique for mining applications (Krige, 1951). The main difference of the Kriging schemes from the above non-geostatistical methodologies is the assumption that spatial correlation structure of the process under study is known and it is estimated from the observed data. The Kriging interpolation schemes are stochastic, local, gradual and exact interpolators. The Kriging methodology includes two stages: the analysis of the spatial variation and the estimation of the target variable, which is also based on the weighted average approach.

The analysis of the spatial variation is performed through the variogram assessment. The observed data are initially used for the estimation of the semivariance from:

where *z*, the variable under study, *h* the lag distance between the pre-ordered data and *n(h)* the number of the paired data at distance *h*. The plot of the semivariance versus the lag distance is the empirical variogram, which is subsequently modeled by a parametric model. The most common models are the linear, the circular, the spherical, the exponential, the Gaussian or a nested model of one or more of the above. The characteristic parameters of the variogram are the sill, which is the semivariance value at the plateau of the variogram, representing the distance where there is no correlation between the observations, the range, which is the lag where the semivariance is equal to the sill and the nugget which represents theoretically the minimum variance and is the semivariance value at zero lag distance (Fig.4).

The Kriging scheme is also known as the best linear unbiased estimator, and its estimates are based on the variogram model and the values and location of the measured points (Eberly et al., 2004). The Kriging interpolation weights are chosen using the modeled variogram so that the estimate is unbiased and the estimation variance is less than any other linear combination of the observed values. The most common Kriging methods are the Simple Kriging, which assumes a known constant mean, Ordinary Kriging, which assumes that there is an unknown constant mean, estimated from the data and Universal Kriging which assumes that there is a trend in the surface that partly explains the data’s variations (Eberly et al., 2004).

### 2.8. Artificial Neural Networks

Recent advances in artificial intelligence propose an additional approach to the problem of point spatial interpolation of irregularly and gridded data An Artificial Neural Network (ANN) is a mathematical model inspired from the brain physiology to take advantage of their advanced problem solving abilities. Biological brains contain specialized cells called neurons, which interconnect and form highly complex networks. A biological neuron is consisted of a cell body (soma), the dendrites, which extend out of the cell body and an axon which is the transmitter of the neuron. In terms of information processing, the basic operation of an artificial neuron, is the summation of all the incoming signals through its dendrites. A pulse, in accordance with the biological functioning, is sent along the axon if sufficient input signal is received to stimulate the neuron to its threshold level. This simplified neuron operation (McCulloch-Pitts model) can be modeled by the data flow diagram presented in Fig. 5.

The output of the neuron in Fig. 5 is calculated using the following equation:

where *f* is the binary threshold function. The abilities of the McCulloch-Pitts neuron model can be extended by replacing the activation functions and depending on the application, linear, sigmoid, tangent hyperbolic and other functions can be used.

An ANN is set of interconnected artificial neurons which can be classified according to their topology. The most commonly employed ANNs are the Feed Forward Networks, which have no feedback connections and their most important characteristic is the lack of memory. The output of such a network depends solely on its current inputs and weights and therefore the order in which the elements of a time series are presented to the network has no effect in its output. Feed forward ANN’s are universal approximators, theoretically capable of estimating any measurable input-output function to any desired degree of accuracy (Hornik et al., 1989). In Fig. 6 their most common topology is presented, which contains three layers of neurons.

The input layer contains a set of artificial neurons with linear activation functions, while in the hidden layer, non-linear activation functions are used. The third layer is the output layer and depending on the application, linear or non-linear activation functions can be used. Besides feed forward ANNs, several topologies exist with feedback connections and their most important feature is the existence of memory. Such networks are called recurrent and their output depends on the current and previous inputs, since these define the state of its memory.

The behavior of an ANN, besides its topology, is also affected by the input neuron weights. The process of determining the weights is called training, which corresponds to the natural learning process in biological brains. Training requires the existence of a training data set, which is composed of input vectors along with the desired output from the network. In general, training can be considered as an optimization problem (selecting the set of weights that minimizes some error measure) and therefore any optimization algorithm can be applied, with backpropagation commonly employed. A successfully trained network is able to generalize, in terms of producing an output for an input that was not presented during training. Such an output can be considered as an interpolation or prediction, depending on the network input. Several issues need to be addressed during training in order to obtain an ANN with satisfactory generalization ability. A common problem, related to the entire family of gradient descent based training algorithms like backpropagation, is that a suboptimal set of weights can be initially determined. A technique to overcome this issue is the multiple training of the ANN, by altering the initial weights values. Another common problem related with training is directly related to the extreme flexibility of ANNs. Specifically, as the number of neurons increases so does the ability of the network to exploit patterns in the training set, irrelevant to the statistical dependence between input and output, such as noise (Blackwell et al., 2009). This effect is known as overtraining and the result is highly degraded generalization ability for the trained ANN. Overtraining can be resolved using early stopping, which requires the partition of the original training set to the normal training set and the validation set, which is continually used during training to evaluate the generalization ability of the ANN. A decrease in this ability will trigger a premature ending of the training process in order to avoid overtraining.

Besides these inherent drawbacks, the wider acceptance of ANNs for statistical interpolation and prediction is limited due to the requirement of a representative training set. This fact differentiates the ANNs from the previous interpolation methodologies.

## 3. Application of spatial interpolation methods

The accuracy assessment for the aforementioned spatial interpolation methodologies is performed for the greater area of metropolitan Athens in Greece. The area under study is the Athens basin, which is bounded by Mount Aegaleo to the west, Mount Parnitha to the north, Mount Pentelikon to the northeast, Mount Hymmetus to the east and the Saronic Gulf coastline to the southwest. These geophysical features along with the prevailing meteorological conditions, land–use types and pollution sources, create a complex pattern of air pollution spatial distribution. The main sources of pollution in the area are vehicular traffic, industrial activities and central heating emissions. The dispersion conditions at the Athens basin have been extensively studied focusing on air pollution episodes in relation to atmospheric conditions (Asimakopoulos et al., 1992; Lalas et al., 1983; Kassomenos et al. 1995; Güsten et al. 1988).

The accuracy assessment of the interpolation methodologies is carried out for Nitrogen Dioxide (NO_{2}) which is a primary pollutant and tropospheric Ozone (O_{3}) which is secondary photochemical pollutant. The analysis is carried out for their mean daily concentrations from the 1^{st} of January 2009 to the 31^{st} of December 2009. The air quality monitoring stations used for the purpose of this study are operated by the Hellenic Ministry of Environment, Energy and Climate Change (MEECC) and their locations are presented in Fig. 7.

The air quality data provided from the MEECC database are mean hourly concentrations of NO_{2} and O_{3}, which are further averaged to provide the mean daily concentrations. Although the mean daily concentrations do not provide information regarding the diurnal pattern of each pollutant, a daily time scale is appropriate in epidemiological studies and for the assessment of the spatial interpolation methodologies. The hourly data availability and the characteristics of each station are presented in Table 1. Calendar days with missing mean daily observations from one or more stations are excluded from the dataset, resulting to a series of 227 and 269 days with concurrent observations for NO_{2} and O_{3} respectively. As it is observed the data availability for both pollutants is high for all stations except for Marousi and Peristeri and for NO_{2} at Piraeus.

### 3.1. Descriptive statistical analysis

The analysis of the mean concentration levels for the examined pollutants reveals the increased spatial variability at the area of study. For NO_{2} the maximum concentrations are observed for the urban station at Patision (91.23μgr/m^{3}) and the minimum for the suburban station at Aghia Paraskevi (18.31μgr/m^{3}). The spatial variability for O_{3} is also high at the Athens basin, but with the exact opposite distribution. Maximum concentrations are recorded for the suburban station of Aghia Paraskevi (84.91μgr/m^{3}) and minimum at Patision station (24.46μgr/m^{3}). The seasonal evolution of the studied pollutants is presented in Fig. 8 for the three stations situated at the urban core of metropolitan Athens (ATH, GEO, PAT) and for the suburban station at Marousi. It is clear that the values fall in different ranges and have different characteristics. Ozone maxima are observed at the suburban regions at the Athens basin, in locations where lower NO_{2} concentrations are reported. Furthermore, O_{3} exhibits a clear seasonal cycle, with maximum concentrations observed during the warm period of the year, while NO_{2} concentrations remain relatively stable throughout the year, with a minimum at August, due to the decreased vehicular traffic.

The quality of the spatial prediction obtained by the various methods can be partly explained from the spatial correlation between the recorded time-series, which is presented in Table 2. The maximum correlation coefficient is observed between MAR and LYK for both pollutants.

### 3.2. Methodology

The assessment of the spatial interpolation methodologies for the greater area of metropolitan Athens includes thirteen schemes in total, which are presented in detail at Table 3. The accuracy assessment of the evaluated interpolation methodologies is performed using the leave-one-out cross-validation methodology. According to this technique, one station is selected as the target station and an interpolation methodology is employed in order to estimate air pollution concentrations at its coordinates. The predictive accuracy of the interpolation scheme is evaluated against the observed concentrations. The leave-one-out cross-validation technique is applied multiple times for NO_{2} and O_{3} for the target stations situated at Athinas (ATH), Geoponiki (GEO), Marousi (MAR), Patision (PAT) and Peristeri (PER).

The application of the NN, Lin, Cub, NaN, IDW, ID^{2}W, RBFlin, SPtp and SPbh schemes is straightforward, following the theoretical background presented at the second section of this chapter. Regarding the Ordinary Kriging methodology, an automated approach is adopted, where isotropy is assumed. During the phase of variogram modeling the circular, spherical, pentaspherical, exponential and Gaussian models are used. The best model for each day and station is selected using the coefficient of determination (*R*
^{
2
}), and subsequently it is employed in the OK spatial interpolation phase. For the selected time frame, during the analysis of the spatial variation, in 56.04% of the days for NO_{2} and in 56.13% of the days for O_{3} the circular model is selected.

Regarding the ANN, the additional dataset required for training consists of daily NO_{2} and O_{3} concentrations from 2000 to 2008, acquired from the MEECC database. This dataset was randomly divided into training and validation sets, with the 80% of its values used for training and the remaining 20% for validation. The test set in this case, is also the mean daily NO_{2} and O_{3} concentrations from 2009. The validation set is used for avoiding overtraining and for selecting the optimum ANN architecture. Multiple ANNs are trained with varying numbers of hidden layer neurons (from 10-50) and the optimum architecture is the one that minimizes the Mean Absolute Error (MAE) on the validation set. In all cases for avoiding the inherent drawback of gradient decent algorithms, training is performed multiple times (50 repetitions). The selected architecture for each station and pollutant is presented in Table 4.

In the case of RBFg and RBFmq, the shape parameter was not chosen arbitrarily as the average Euclidean distance of all station pairs. A sensitivity assessment of the effect of the parameter *c* on the predictive accuracy of RBFg and RBFmq for each station and pollutant is performed using the additional dataset from 2000-2008. The optimum value of the parameter *c* is the one that minimizes the *MAE* and the results are presented in Table 5.

The performance evaluation of each interpolation scheme is based on a set of correlation and difference statistical measures (Willmott, 1982), along with the application of Wilcoxon’s sign rank test. The proposed measures for assessing the accuracy of the interpolation schemes include the correlation coefficient (*R*) and coefficient of determination (*R*
^{
2
}), the Mean Absolute Error (*MAE*), which is proposed as the optimal model performance error for dimensioned evaluations (Willmott & Matsuura, 2005), the Mean Absolute Percentage Error (MAPE), the Root Mean Square Error (*RMSE*), the Mean Bias Error (*MBE*) and the Index of agreement *d* (Willmott, 1982), which is a descriptive measure that scales with the magnitude of the variables, retains mean information and does not amplify outliers (Vicente-Serrano et al., 2003).

According to the above statistical criteria and in combination with data display graphics like scatterplots the best performing scheme is selected. In each case the best performing scheme is statistically tested for its equal predictive accuracy versus the rest interpolation schemes using the Wilcoxon’s signed-rank test (Diebold & Mariano, 1995):

where

and

where, T the number of the concurrent observations, *S*
_{
it
} the concentration at the i station at time t, *S’*
_{
it
} the prediction of the best performing interpolation scheme according to statistical measures, and *S’*
_{
itj
} the prediction of the j scheme at station i and at time t. The null hypothesis of equal predictive accuracy between the methodologies is evaluated by comparing the test statistic (S_{3a}) against the t-distribution. Superior forecasts are indicated by the sign on the test statistic, which is determined by the order which the error terms are subtracted from each other in *d*
_{
t
} (Snell et al., 2000).

### 3.3. Results

A general remark is that the performance of the various interpolation schemes varies significantly, depending on the target station and therefore a universal method cannot be selected for the entire field of interest as the optimum interpolation scheme. Furthermore, the air pollutant of concern affects the performance of spatial interpolation methods, attributed to the differentiations in transport mechanisms and chemical transformations.

Initially, the results for the primary pollutant NO_{2} are discussed. The main performance statistics are the MAE and MAPE, which are presented in Table 6. A direct finding is that the ANN scheme for NO_{2} consistently outperforms all interpolation methods for all target stations. Among these stations, the most difficult to be predicted is PAT station since all methods produce predictions with significant errors. However, even for this station, ANN predictions are distinctively better (MAPE = 15.67%) compared to other methods with MAPE values ranging from 28.22% for RBFg to 55.10% for IDW scheme. The best ANN predictions for NO_{2} are obtained for GEO and MAR having MAPE values 5.9% and 6.1% respectively. The results of the other methods for these two stations are also superior compared to the rest of the target stations, although they cannot compare favorably against ANN predictions. The above results are in accordance with the rest statistical measures. The explained variation by ANN, as it expressed by R^{2}
_{,} has a mean 77.29% for all target stations, varying from 59.04% for PAT to 90.46% for GEO station. Furthermore, high *d* values (greater than 0.92) are reported for all stations except PAT. Regarding the average model bias, the low MBE values are obtained for ANNs (ranging from -1.34μgr/m^{3} to 0.33μgr/m^{3}) provide strong indication of its superior predictive abilities compared to the other methods, which systematically underestimate or overestimate the observed concentrations.

Regarding O_{3} predictions, ANN outperforms the rest interpolation schemes for three target stations (namely ATH, PAT, PER) although its predictive ability is generally lower compared to NO_{2} predictions (slightly higher MAPE values). For the stations at GEO and MAR the best results are obtained from RBFg and NN respectively (Table 7). The unexpected superior performance of the simplistic NN scheme can be attributed to the extremely high spatial correlation (*R* = 0.94) between MAR and its nearest neighbor station, located at Lykovrisi. It should be noted that even for these two stations, ANN is among the best performing schemes. In detail, for ATH, PAT and PER the ANN’s *d* values are greater than 0.88 and the *RMSE* is lower than 13.89μgr/m^{3}. The higher prediction errors for all schemes are also observed for the urban PAT station, indicating the increased spatial variability of air pollution in the urban core of metropolitan Athens.

The scatter diagrams produced for each case are in accordance with the results from the statistical measures and some representative cases are presented in Fig. 9. Limited dispersion along the x=y axis is observed for the best performing schemes (for NO_{2} at GEO for ANN, for O_{3} at GEO for RBFg and for O_{3} at MAR for NN) while systematic under-prediction is observed for the IDW scheme for NO_{2} at GEO (MBE = -8.23μgr/m^{3}).

The methodologies based on the tessellation of the spatial field of interest (NN, Lin, Cub, NaN) along with the schemes that use simple functions of distance (IDW, ID^{2}W) for the estimation of the spatial distribution of air pollution, in most cases fail to describe adequately the spatial variation of NO_{2} and O_{3}. The spatial setting of the monitoring stations especially for the primary pollutant NO_{2} is not dense enough to provide adequate information for the efficient modeling of the air pollution levels. Regarding O_{3}, especially for the MAR station an increased performance of the simplified schemes is observed attributed to the existence of a highly representative station in its proximate region (LYK).

The results of the OK methodology are greatly affected by the sampling configuration of the monitoring network. In most cases, except for O_{3} at Geoponiki the OK prediction is comparable to more simplistic models, giving evidence that the scheme cannot give highly accurate results at every point of interest in the area under study. The performance of the methodology could however be increased taking into account, during the variogram analysis, the effect of meteorological factors and topography that induce the anisotropy in the spatial distribution of both primary and secondary pollutants. In our case anisotropism is ignored to simplify and automate the model fitting. The Gaussian scheme from the family of the RBF interpolants is found to be in average the best alternative scheme to ANNs with *d* values ranging from 0.62 at PAT to 0.89 at MAR and from 0.66 at PAT to 0.96 at GEO for NO_{2} and O_{3} respectively.

The results of the Wilcoxon’s test statistic are presented in Table 8 and are used for assessing the statistical significance of the aforementioned results. Regarding NO_{2} and for all stations the predictive ability of the ANN is assessed versus the alternative schemes, while for O_{3} the ANN is assessed for ATH, PAT and PER, the RBFg for GEO and the NN for MAR. The hypothesis of ANN’s equal predictive accuracy for NO_{2} is rejected in all cases at p<0.05 and in all cases except for NN, Lin and Cub for PER at p<0.01. The ANN predictive accuracy for O_{3} is statistical significantly superior from the rest interpolation schemes at ATH, PAT and PER at p<0.05 and in all cases except for RBFg at ATH and PER at p<0.01. Regarding the RBFg at GEO and the NN at MAR, the null hypothesis of equal predictive accuracy is accepted at p<0.05 for the Lin and the OK models and for the RBFlin and ANN models respectively.

## 4. Conclusion

The estimation of pollution fields, especially in densely populated areas, is an important application in the field of environmental science due to the significant effects of air pollution to public health. In this chapter the spatial interpolation methodologies, commonly employed for air pollution point estimations, are reviewed and their accuracy is assessed for five target sites located at the greater area of metropolitan Athens. In general, an optimal interpolation scheme that outperforms the rest methodologies at the area of study is not established. The errors associated with all interpolation schemes are related to the pollutant of concern and mainly to the spatial configuration of the monitoring network.

The Artificial Neural Network approach to point spatial interpolation is found to be specifically promising, compared to the traditional interpolation techniques. For the primary (NO_{2}) and secondary (O_{3}) pollutants addressed in this study, ANNs are found to be statistical significantly superior in all cases except for one case where their predications have equal predictive accuracy with the RBFg scheme and for O_{3} at the Marousi station, where the simplistic Nearest Neighbor model outperforms the trained ANN scheme. In this context the ANNs are proposed as an alternative method to spatial interpolation, with increasingly important applications where point air quality estimations are prerequisite. The performance of the ANNs can be further improved, since there is no immediate restriction to the number of their inputs, by including parameters related to the atmospheric dispersion and the chemistry of the pollutant under consideration. The critical requirement of a representative dataset during training is the basis for their limited operational applications in air quality spatio-temporal forecasting.