Limit concentration values of ambient air pollutants according to EC directives.
Over the past few decades the phenomenon of urbanization resulted in severe problems. The quality of human life has been deteriorated in the megacities around the world. This chapter deal with the Artificial Networks (ANNs) forecasting ability in predicting the air quality as well as the bioclimatic conditions in an urban environment.
For this purpose, different ANNs are demonstrated in this chapter. These ANNs have been developed in order to predict the air quality as well as the bioclimatic conditions within the Greater Athens Area (GAA), Greece. The prognosis for both air quality and bioclimatic conditions within GAA concerns the next three days (24 to 72 hours prediction).For the proper ANNs training for both air quality and bioclimatic conditions, hourly values of specific meteorological parameters such as air temperature, relative humidity, wind speed, wind direction, air pressure, sunshine and solar radiation, as well as hourly values of air pollutants concentrations have been used. These hourly data have been recorded in many different sites within GAA from the network of the Greek Ministry of Environment Energy and Climatic Change (GMEECC) during the period 2001-2005. Hourly values of barometric pressure and total solar irradiance for the same time period were acquired from the National Observatory of Athens (NOA).
This chapter is divided into nine sections. The first section is brief introduction concerning ANNs. The second section presents air quality indices that have been used in this work in order to describe the air quality within GAA. The third section presents bioclimatic indices, which describe the human thermal comfort-discomfort due to meteorological conditions. The fourth section presents statistical performance indices that have been used in order to investigate the predictive ability and reliability of the developed ANNs models. The fifth section demonstrates the examined sites within the GAA and the data/methodology used in this study.
The sixth section presents the ANNs that were developed in order to predict the maximum daily value of the air pollution indices as well as the persistence of the phenomenon, namely the number of consecutive hours within the day with high/strong air pollution. The seventh section presents the ANNs that were developed in order to predict the daily values of the bioclimatic indices as well as the number of consecutive hours within the day with dangerous bioclimatic conditions for humans’ health. The eighth section includes the spatial variation for both air quality levels and human comfort/discomfort levels within GAA. The ninth is the last section summarizing briefly the extracted results by the performed analysis and how these results can contribute positively to the economy, energy, environment and quality of human life in general.
Finally the results of this work have shown that the ANNs could give an adequate forecast for both air quality and bioclimatic conditions within the urban environment of the GAA for the next three days at a statistical significant level of p<0.01.
2. Artificial Neural Networks
Artificial Neural Networks (ANNs) are a branch of artificial intelligence developed in the 1950s aiming at imitating the biological brain architecture. They are an approach to the description of functioning of human nervous system through mathematical functions. Typical ANNs use very simple models of neurons. These artificial neurons models retain only very rough characteristics of biological neurons of the human brain (McCulloh & Pitts, 1943). ANNs are parallel-distributed systems made of many interconnected non-linear processing elements (PEs), called neurons (Hecht-Nielsen, 1990). A renewal of scientific interest has grown exponentially since the last decade, mainly due to the availability of appropriate hardware that has made them convenient for fast data analysis and information processing (Viotti et al., 2002).
Figure 1 presents the structure of a biological neuron (upper graph) as well as the structure of an artificial neural (lower graph).
ANNs have been applied in time series prediction (Lapedes & Farber, 1987; Werbos, 1988). Although their behaviour has been related to non-linear statistical regression (Bishop, 1995), the big difference is that ANNs seem naturally suited for problems that show a large dimensionality of data, such as the task of identification for systems with great number of state variables. Over the last years, black box approaches have been recognized to constitute a viable alternative to conceptual models for input-output simulation and forecasting and also to allow shortening the time required for the model development. In particular, ANNs concentrated a general consensus in predicting different pollutants time series, as shown by the review of Gardner & Dorling (1998a, 1998b).
Many ANNs were developed for very different environmental purposes. Heymans & Baird (2000) have used network analysis to evaluate the carbon flow model built for the northern Benguela upwelling ecosystem in Namibia. Antonic et al. (2001) have estimated the forest survival after building the hydroelectric power plant on the Drava River, Croatia by means of GIS constructed database and a neural network. Karul et al. (2000) used a three-layer Levenberg-Marquardt feedforward neural network to model the eutrophication process in three water bodies in Turkey. Besides, Moustris et al. (2011) used ANNs for long term precipitation forecast, using long-term monthly precipitation time series of four meteorological stations in Greece.
Viotti et al. (2002) used ANNs to forecast short and middle long-term concentration levels for some of the well-known pollutants at the urban area of Perugia, Italy. The ANNs approach proved to be viable also for O3, PM10, NO2, NOx forecasting, outperforming alternative techniques in different case studies (Nunnari et al., 1998; Prybutok et al., 2000; Kolehmainen et al., 2001; Balaguer Ballester et al., 2002; Schlink et al., 2003; Corani, 2005; Slini et al., 2006; Dutot et al., 2007; Papanastasiou et al., 2007).
2.1. Multi-Layer Perceptron and feed-forward ANNs
The Multi-Layer Perceptron (MLP) is the most commonly used type of ANNs. Its structure consists of Processing Elements (PEs) and connections (Hecht-Nielsen, 1991). PEs, which are called neurons, are arranged in layers. The first layer is the input layer, one or more hidden layers follow and the final layer is the output layer. An input layer serves as buffer that distributes input signals to the next layer, which is a hidden layer. Each neuron of the hidden layer communicates with all the neurons of the next hidden layer, if any, having in each connection a typical weight factor. So, each unit-artificial neuron in the hidden layer sums its input, processes it with a transfer function and distributes the result to the output layer. It is also possible that there are several hidden layers connected in the same fashion. The units-artificial neurons in the output layer compute their output in a similar manner. Finally, the signal reaches the output layer, where the output value from the ANN is compared to the target value and an error is estimated. Thus, the values of weight factors are amended appropriately and the training cycle repeats until the error is acceptable, depending on the application.
Since data flow within the artificial neural network from a layer to the next one without any return path, such kind of ANNs are defined as feed-forward ANNs. The structure of a feed-forward Multi-Layer Perceptron artificial neural network can be represented as in Figure 2.
2.2. Feed-forward ANNs training and the Back-propagation training algorithm
The training-learning process of ANNs can be far from the ensemble optimum in some cases, and the problem can be solved only with a very good database, a best choice of the input configuration for training, or using most powerful learning algorithms (Viotti et al., 2002).
The back-propagation learning algorithm consists of two steps of computation: a forward pass and a backward pass. In the forward pass, an input pattern vector is applied to the sensory nodes of the network, i.e. to the units in the input layer. The signals from the input layer are propagated to the units in the first layer and each unit produces an output. The outputs of these units are propagated to the units in the subsequent layers and this process continues until, finally, the signals reach the output layer, where the actual response of the network to the input vector is obtained (Figure 2).
During the forward pass, the synaptic weights of the network are fixed. During the backward pass, on the other hand, the synaptic weights are all adjusted in accordance with an error signal, which is propagated backward through the network against the direction of synaptic connections.
The mathematical analysis of the algorithm is as follows (Viotti et al., 2002). In the forward pass, given an input pattern vector y(p), each hidden node-neuron j receives a net input:
where wjk represents the weight between the hidden neuron j and the input neuron k. Thus, the hidden neuron j produces an output:
where f(x) is the activation faction of the hidden layer. Different kinds of activation functions are referenced in the literature, such as linear, sigmoid, hyperbolic tangent, logistic, etc. (Norgaard et al., 2000). In the following, we consider a hyperbolic tangent activation function for the neurons in the hidden layer, hence, the value returned by the activation function of neuron j of the hidden layer is:
Each output neuron receives the input from the preceding hidden layer by the forecasted value, so that the entry to the output neuron can be written as:
where wj represents the weight between the output neuron and the hidden neuron j. It therefore produces the final output:
The presentation of all the patterns is usually called epoch. Many epochs are generally needed before the error becomes acceptably small. In the batch neuron the error signal is calculated for each input pattern but the weights are modified only when the input patterns have been presented. The error function is calculated referring to the Mean Square Error (MSE) and the weights are modified accordingly:
where d is the desired or real output (monitored variable value) and y is the ANN output or the forecasted value. In the batch mode, E is equal to the sum of all MSEs on all the patterns of the training set. E is obviously a differentiable function of all weights (and thresholds) and therefore we can apply the gradient descent method. For the hidden to output connections the gradient descent rule gives:
where η is a number called learning rate. The learning rate is a parameter that determines the size of the weights adjustment each time the weights are changed during the training process. Small values for the learning rate cause small weight changes and large values cause large changes (Attoh-Okine, 1999). The best learning rate is not obvious. If the learning rate is 0.0, the network will never learn.
Refenes et al. (1994) reported that one and tow layered network with a learning rate of η=0.2 and a momentum rate of 0.3<α<0.5 yield the best combination of convergence. The momentum term is a factor used to speed network training. It adds a proportion of the previous weight changes to the current weight changes.
Using the chain rule it can be written as:
Thus, the hidden to output connections are updated according to the following equation:
For the input to hidden layer connections the gradient descent rule is:
Then using the chain rule, we obtain:
Particularly, with reference to, it can be written as:
and after simple passages, we obtain:
Therefore, with reference to, it can be written as:
from which the input to hidden connections updating is obtained as:
and finally we get:
It is worthwhile noting that, a network architecture having just one hidden layer, and activation functions arranged as described above, constitutes a universal predictor and it can theoretically approximate any continuous function to any degree of accuracy. In practice, such degree of flexibility is not achievable because parameters must be estimated from sample data, which are both finite and noisy (Barazzetta & Corani, 2004).
The ANNs work on a matrix containing more patterns. Particularly, the patterns represent the rows while the variables are the columns. This data set is a sample. To be more precise, giving the ANN three different subsets of the available sample we can get the forecasting model; the three subsets concern the training, the validation and the test subsets. These subsets are briefly described:
Training subset, the group of data with which we train-educate the network according to the gradient descent for the error function algorithm, in order to reach the best fitting of the nonlinear function representing the phenomenon.
Validation subset, the group of data, given to the network still in the learning phase, by which the error evaluation is verified, in order to update the best thresholds and weights effectively.
Test subset, one or more sets of new and unknown data for the ANN, which are used to evaluate ANN generalization, i.e. to evaluate whether the model has effectively approximated the general function representative of the phenomenon, instead of learning the parameters uniquely.
3. Air quality indices
Urban air pollution is a growing problem in big cities with large urbanization, where adverse health effects have been established. Bad city design combined with specific topographical and meteorological conditions allowing poor circulation, are associated with frequent episodes of critically high atmospheric pollution, enforcing in some cases extreme actions by the authorities, such as restriction of motor vehicles circulation within large area of the city.
For a better and more effective monitoring and analysis of air quality in big cities, air pollution indices are often used. Most of them have resulted after a series of epidemiological studies, which investigated the impact of air pollution on public health. In this work, two air pollution indices are presented and applied in order to forecast the air quality within GAA using ANNs.
3.1. Description of the European Regional Pollution Index (ERPI)
The European Regional Pollution Index (ERPI) has been proposed and developed by Moustris (2009). This air quality index is based on the air pollution index that is known as Regional Pollution Index (RPI). The New South Wales government in Sydney, Australia used RPI since the mid 1990s (NSW-EPA 1998, 2006).
The calculation of ERPI was performed using the thresholds prescribed by the European Community (EC) based on the framework directive 1996/62/EC and the three affiliated directives 1999/30/EC, 2000/69/EC, and 2002/3/EC (Table 1). Due to the way of calculation of ERPI, based on EC air pollution thresholds, the Australian RPI was renamed as European Regional Pollution Index (ERPI).
In this work, ERPI was calculated for five main air pollutants. Concretely, the air pollutants concern nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), ozone (O3) and particulate matter with aerodynamic diameter less than or equal to 10 μm (PM10). For any observed concentration C i , the value of the sub-index I i is given by:
|Air Pollutant||Limit values|
|ΝΟ2||Hourly value: 200 μg/m3|
|SO2||Hourly value: 350 μg/m3|
|CO||Maximum daily mean value for 8 hours: 10 mg/m3|
|O3||Maximum daily mean value for 8 hours: 120 μg/m3|
|PM10||Mean daily value: 50 μg/m3|
Once a sub-index I i is obtained for each air pollutant (Table 1), the overall ERPI is simply taken as the maximum of all the I i values according the formula:
where I 1 , I 2 , I 3 , I 4 , and I 5 are the sub-indices whose values are defined by the NO2, SO2, CO, O3 and PM10, respectively. If ERPI ≥ 50 this means that at least one of the pollutants is over its limit value (Table 1). Table 2 presents the classification of air quality according to ERPI values (Moustris, 2009; Moustris et al., 2010).
|0 – 2||1||Very Good|
|2 – 21||2||Good|
|21 – 40||3||Satisfactory|
|40 – 60||4||Sufficient|
|60 – 79||5||Poor|
3.2. Description of Daily Air Quality Index (DAQx)
A new impact-related air quality index obtained on a daily basis and abbreviated as DAQx (Daily Air Quality Index) has been recently developed and tested by the Meteorological Institute of Freiburg, Germany, and the Research and Advisory Institute for Hazardous Substances, Freiburg, Germany (Mayer et al., 2002a, 2002b; Makra et al., 2003). DAQx considers the air Pollutants SO2, CO, NO2, O3 and PM10. To enable a linear interpolation between index classes, DAQx is calculated for each pollutant by:
with Cinst: highest daily 1 hour concentration of SO2, NO2, and O3, highest daily running 8 hours concentration of CO, and mean daily concentration of PM10. Cup is the upper threshold of specific air pollutant concentration range; Clow is the lower threshold of specific air pollutant concentration range; DAQxup is the value of DAQx according to Cup; DAQxlow is the value of DAQx according to Clow (Table 3.).
The daily value of DAQx is considered the highest value extracted by the calculated values for each pollutant.
4. Bioclimatic indices
The growth of the city of Athens during the last decades and the phenomenon of urbanization (Philandras et al., 1999) have established the well known Urban Heat Island (UHI) at a great areal extent of the city, resulting in explicit effects on human thermal comfort-discomfort. Thermal comfort is defined as the condition of mind, which expresses satisfaction with the thermal environment, absence of thermal discomfort, or conditions in which 80% or 90% of humans do not express dissatisfaction (Givoni, 1998).
Several indices, which describe the human thermal comfort-discomfort, have been developed worldwide. In this chapter three bioclimatic indices will be presented. The Discomfort Index (DI), the Cooling Power index (CP) and the Physiologically Equivalent Temperature (PET). In the process, these indices are briefly described.
4.1. Discomfort Index (DI)
The Discomfort Index (DI) was originally developed by Thom (Thom, 1959) and was supported by later works (Clarke & Bach, 1971; Giles et al., 1990). This index describes the degree of thermal load under various meteorological conditions, suitable for both outdoor and indoor environments. It is useful to evaluate how current temperature and relative humidity can affect the sultriness or discomfort sensation and cause health danger in the population.
|DI (oC)||Classification of human comfort-discomfort sensation|
|DI<21||No discomfort feeling|
|21≤DI<24||Less than 50% of the total population feels discomfort|
|24≤DI<27||More than 50% of the total population feels discomfort|
|27≤DI<29||Most of the population feels discomfort|
|29≤DI<32||The discomfort is very strong and dangerous|
|DI≥32||State of medical emergency|
Several formulas of the index have been proposed for use along with tables of boundary values that indicate degrees of comfort-discomfort. In the present work we used the following formula of DI, calculated as a combination of air temperature T (ºC) and relative humidity RH (Giles et al., 1990):
4.2. Cooling Power index (CP)
The Cooling Power Index (CP) was developed by Siple & Passel (1945) and describes the loss of energy, per unit of time and body surface, which a human organism can tolerate. The CP index, in contrast to the DI index, takes into consideration the wind speed instead of relative humidity. It describes the heat flux per surface unit of the human body towards the environment and the vice versa. For the calculation of the CP index hourly values of air temperature (T, oC) and wind speed (V, m/sec) were used. The calculation of CP is based on the following formula (Tzenkova et al., 2003):
|CP (W/m2)||Classification of human comfort|
|CP<0||Endothermal - very hot discomfort|
|0<CP≤174||Atonic – hot discomfort|
|175≤CP≤349||Hypotonic – hot sub comfort|
|350≤CP≤699||Neutral - comfort|
|700≤CP≤1049||Tonic – cold sub comfort|
4.3 Physiologically Equivalent Temperature (PET)
The thermal index Physiologically Equivalent Temperature (PET) is based on the total energy balance of the human body. PET values were evaluated (Mayer & Höppe, 1987; Höppe, 1999), in order to interpret the grade of the thermophysiological stress (Table 6). It describes the effect of the thermal environment as a temperature value (oC) and can be quantified easier for non specialists in this topic. For night time situation, air temperature corresponds very close to the PET value. It has been applied in heat waves and climatic variability studies (Nastos & Matzarakis 2008, Matzarakis & Nastos 2010) and weather impacts on health (Nastos & Matzarakis, 2006).
The PET analysis was performed by the use of the radiation and bioclimate model, RayMan, which is well-suited to calculate radiation fluxes and human biometeorological indices (Matzarakis et al., 1999, 2010) and was chosen for all our calculations of mean radiant temperature and PET. The RayMan model, developed according to the Guideline 3787 of the German Engineering Society (VDI, 1998) calculates the radiation flux in easy and complex environments on the basis of various parameters, such as air temperature, air humidity, degree of cloud cover, time of day and year, albedo of the surrounding surfaces and their solid-angle proportions (Matzarakis et al., 2010).
|PET (°C)||Thermal sensation||Physiological stress level|
|extreme cold stress|
strong cold stress
moderate cold stress
slight cold stress
no thermal stress
slight heat stress
moderate heat stress
strong heat stress
extreme heat stress
5. Statistical performance indices
The quality and reliability of the developed ANNs, concerning their ability to forecast both air quality and bioclimatic conditions within GAA, were tested using several statistics indices that have already been applied in similar studies (Moustris et al., 2010). The statistical performance indices that used in this work are presented and described briefly:
where N is the number of the data points, O i is the observed data and P i is the predicted data. The MBE represents the degree of correspondence between the mean forecast (P i ) and the mean observation (O i ). MBE is used to describe how much the model underestimates or overestimates the observed data. Positive/negative values indicate over estimated/under estimated prediction.
RMSE provides a measure of how well future outcomes are likely to be predicted by the model.
The coefficient of determination (R2) indicates how much of the observed variability is accounted by the estimated model (Kolehmainen et al., 2001). The coefficient of determination is a number between 0 and +1 and measures the degree of association between two variables. The coefficient of determination is calculated according to the equation (Comrie, 1997):
where O iave is the average of the observed data.
A relative measure of error, called the index of agreement (IA), is also discussed in Willmott et al. (1985). Index of agreement is calculated according to the formula:
where O iave is the average of the observed data. This is a dimensionless measure that is limited to the range of 0-1. If IA=0, that means no agreement between prediction and observation and if IA=1, that means perfect agreement between prediction and observation.
6. Data and methodology
For the calculation of the bioclimatic indices as well as the air quality indices, appropriate meteorological data in hourly basis were used. More specifically, hourly values of air temperature (oC), relative humidity (%) and wind speed (m/s were used for DI and CP calculation. In addition to the aforementioned meteorological parameters, total cloudiness cover (octas) was taken into consideration for PET calculation, using the RayMan model (Matzarakis et al., 1999, 2010). The appropriate meteorological parameters used as inputs in the RayMan model were acquired from the National Observatory of Athens, for the period 2001-2004. Besides, hourly values of air pollutants concentrations (NO2, SO2, CO, O3 and PM10) were used in order to estimate the two air quality indices ERPI and DAQx. All the above datasets have been recorded by the network of the GMEECC covering the period 2001- 2005 and concern nine (9) different regions within the GAA, namely the regions: Agia Paraskevi, Thrakomakedones, Lykovrissi, Maroussi, Liossia, Galatsi, Patission, Aristotelous, and Geoponiki (Fig. 3). For a better surveillance, the examined regions-stations listed below with the following abbreviations: Agia Paraskevi (APA), Galatsi (GAL), Liossia (LIO), Maroussi (MAR), Patission (PAT), Aristotelous (ARI), Thrakomakedones (THR), Lykovrissi (LYK). The hourly values of air barometric pressure and total solar irradiance for the same time period were obtained from the National Observatory of Athens.
To describe the air quality within the GAA the values of the air quality indices ERPI and DAQx were calculated on an hourly basis in seven different regions-stations (APA, THR, LYK, MAR, LIO, GAL and PAT) with respect to the pollutants NO2, SO2, CO, and O3 and in five different regions-stations (APA, THR, LYK, MAR and ARI) with respect to the particulate matter PM10. The maximum value of the 24 hourly values was considered as the daily value for each one of the two air quality indices. Thus, for each one station-region two daily values for each one of the two examined air quality indices were calculated. The first daily value concerns the air pollutants NO2, SO2, CO, and O3 and the second concerns the particulate matter PM10. This happened because the daily concentrations of particulate matter PM10 as well as the daily concentrations of ozone are both high enough. If only one daily value for each of the two air quality indices was calculated, then, we will not be able to know if that value is due to ozone or PM10. Thereafter, an appropriate number of ANN models were developed and trained in order to predict for the next three days the daily value for each one of the two air quality indices as well as the number of consecutive hours during the day where the value of the index is greater than a threshold value.
The bioclimatic conditions within the GAA are interpreted by the use of the bioclimatic indices DI and CP, which were calculated on hourly basis for eight different regions-stations (APA, THR, LYK, MAR, LIO, GAL, GEO and PAT). In the process, the daily value for each index in each region-station was calculated. The calculation was carried out only during the warm period of the year (May-September) in order to describe the human discomfort due to heat stress weather conditions. Then, an appropriate number of ANNs were developed and trained in order to predict for the next three days the daily value for each one of the two bioclimatic indices as well as the number of consecutive hours during the day, where the value of the index is greater than a threshold value (DI ≥ 24 oC) or less than a threshold value (CP ≤ 174 W/m2). Furthermore, the mean daily values of PET index were estimated only for the National Observatory of Athens, because of the availability of the total parameters needed as inputs in RayMan model. Thereafter, the developed ANN was evaluated in forecasting PET for the next three days.
7. Air quality forecasting using ANNs
7.1. ANNs description
Six different ANNs were developed in order to forecast the air quality levels within the GAA. The first one (ANN#1) was trained in order to forecast the daily value of ERPI (for the pollutants CO, NO2, SO2 and O3) for the next day at seven different areas of GAA (APA, THR, LYK, MAR, LIO, GAL and PAT). The second one (ANN#2) was trained in order to forecast the daily value of DAQx (for the pollutants CO, NO2, SO2 and O3) for the next day at the above seven different areas within the GAA. The third one (ANN#3) was trained in order to forecast the daily number of the consecutive hours for the next day, with at least one of the pollutants concentrations (CO, NO2, SO2 and O3) above a threshold according to directives of European Community, for each one of the seven examined stations within the GAA. The fourth (ANN#4) was trained in order to forecast the daily value of ERPI (with respect to PM10) for the next day, at five different areas of GAA (APA, THR, LYK, MAR, and ARI). The fifth (ANN#5) was trained in order to forecast the daily value of DAQx (with respect to PM10) for the next day, at the mentioned five different areas within the GAA. Finally the sixth (ANN#6) was trained in order to forecast the daily number of the consecutive hours for the next day with the PM10 concentrations above a threshold according to EC directives, for each one of the five examined stations within the GAA.
In each case, the group of data defined as “the training set”, used for ANNs training, concerns the time period 2001-2004. The group of data defined as “the validation set”, given to the network still in the learning phase, accounts 20% of ‘the training set” for each one of the developed ANN models. Finally “the test set” refers to the year 2005. The year 2005 is absolutely unknown to the models, in order to reveal the models forecasting ability. Table 6 presents the input and output data for the six developed ANN models.
The combination of selected data for the appropriate ANN models training was done after a series of several tests (trial and error method). At the end, the combination that gave the best forecasting result in each case was selected (Table 7.).
In this point, we have to mention that for all the constructed ANN models we have used as input data, in addition to other parameters, the maximum and minimum air temperature, the maximum and minimum wind speed for the next day as well as the mean daily air barometric pressure and the mode daily wind direction for the next day. This may produce a limitation in the forecasting attempt, but it is easy to have access to these forecasted values through the network of the Hellenic National Meteorological Service (HNMS).
|INPUT DATA (input layer)||ANN#1||ANN#2||ANN#3||ANN#4||ANN#5||ANN#6|
|Stations’ number (1,2,3,4,5,6,7)||√||√||√||√||√||√|
|Mean daily air pressure (mbar) for the six previous days||√||√||√||√||√||√|
|Daily sum of the global solar irradiance for the six previous days (W/m2)||√||√||√||√||√||√|
|Maximum (Tmax) and minimum (Tmin) daily temperature (0C) for the six previous days||√||√||√||√||√||√|
|Maximum (WSmax) and minimum (WSmin) daily wind speed (m/sec) for the six previous days||√||√||√||√||√||√|
|Maximum (RH%max) and minimum (RH%min) daily relative humidity for the six previous days||√||√|
|Cosine and sine of the mode daily wind direction for the six previous days||√||√||√||√||√||√|
|ERPI daily value for the six previous days||√||√||√||√|
|DAQx daily value for the six previous days||√||√|
|The number of consecutive hours during the day with ERPI≥50 for the six previous days||√||√|
|Mean daily air pressure (hPa) one day ahead||√||√||√||√||√||√|
|Maximum (Tmax) and minimum (Tmin) daily temperature (0C) one day ahead||√||√||√||√||√||√|
|Maximum (RH%max) and minimum (RH%min) daily relative humidity one day ahead||√||√|
|Maximum (WSmax) and minimum (WSmin) daily wind speed (m/sec) one day ahead||√||√||√||√||√||√|
|Cosine and sine of the mode daily wind direction one day ahead||√||√||√||√||√||√|
|OUTPUT DATA (output layer)|
|ERPI daily value for the next day||√||√|
|DAQx daily value for the next day||√||√|
|The number of consecutive hours with ERPI≥50 for the next day||√||√|
7.2. Forecasting of daily ERPI and DAQx values for the next day
The global fit agreement statistical indices as well as the excess statistical indices for the observed and predicted ERPI and DAQx values were calculated and demonstrated for the eight examined stations respectively. More specifically, Oave, Pave, MBE, RMSE, IA and R2 values for ERPI index are presented in Table 8.
|Oave||Pave||MBE||RMSE||IA||R2||Oave||Pave||MBE||RMSE||IA||R2||Oave||Pave||MBE (hours)||RMSE (hours)||IA||R2|
Concerning the pollutants CO, NO2, SO2 and O3, the R2 values show a very satisfactory prediction for ERPI-ANN#1 (0.381 ≤ R2 ≤ 0.826) as well as for the DAQx-ANN#2 (0.378 ≤ R2 ≤ 0.686) during the test year 2005. Besides, IA values show also a very good prediction for ERPI-ANN#1 (0.717≤IA≤0.937) and the DAQx-ANN#2 (0.746 ≤ IA≤ 0.889). In all cases, it seems that the prediction for the pollutants CO, NO2, SO2 and O3 is much more successful using the ERPI, which is according to the European Community directives, instead of the DAQx. But using both predictions we can have a better and safe “picture” about air quality one day ahead within the GAA. As far as the air pollution persistence (for the pollutants CO, NO2, SO2 and O3) is concerned, it seems that ANN#3 gives an adequate prediction. The R2 values range between 0.017 and 0.605 while IA range between 0.299 and 0.877.
Finally, the worst prediction with respect to the air quality index ERPI appears for the region-station PAT (city centre) against the region-station LIO (urban area) concerning the air quality index DAQx. Generally, it seems that the prediction for the stations, which are closer to the GAA’s downtown, is not so good compared to the prediction of the peripheral regions-stations. This is likely due to the traffic load and the bad air circulation within the city’s centre, meaning that, more relevant data, associated with the above mentioned factors, are needed for a better ANNs training.
Figure 4 presents the best prediction (LYK) and the worst prediction (PAT) for ERPI concerning the pollutants CO, NO2, SO2 and O3, while the best prediction (LYK) and the worst prediction (LIO) for DAQx concerning the same pollutants are depicted in Figure 5 Accordingly, Figure 6 presents the best prediction (THR) and the worst prediction (APA) for ERPI concerning the pollutant PM10, and Figure 7 shows the best prediction (LYK) and the worst prediction (ARI) for DAQx with respect to the pollutant PM10. During the warm period of the year (May-September) the values of ERPI (Figure 4) are greater than 50, meaning that at least one pollutant’s concentration is above its threshold according to the EC directives. In most cases (more than 90%) the corresponding pollutant for these high values of ERPI is the ozone. The same results revealed from Figure 5 regarding DAQx, where during the warm period of the year the daily values of DAQx are greater than 3.5, meaning that a bad air quality exist in most cases. As far as the PM10 concentrations are concerned (Figures 6 and 7), it is shown that, for almost half of the days throughout the year are above the threshold concentration value, indicating bad air quality in the most of the examined stations-regions.
8. Bioclimatic conditions forecasting using ANNs
8.1. ANNs description for DI and CP forecasting
Four different ANN models were developed in order to forecast the bioclimatic conditions within the GAA during the warm period of the year (May-September). The first one (ANN#7) was trained in order to forecast the daily value of Thom’s DI index for the next day at eight different areas of GAA (APA, THR, LYK, MAR, LIO, GAL, GEO and PAT). The second one (ANN#8) was trained in order to forecast the daily value of CP index for the next day at the above mentioned eight different areas within the GAA. The third one (ANN#9) was trained in order to forecast the daily number of the consecutive hours with DI ≥ 24 oC for the next day at each one of the eight examined stations within the GAA. Finally, the fourth (ANN#10) was trained in order to forecast the daily number of the consecutive hours with CP ≤ 174 W/m2 for the next day at each one of the eight examined stations within the GAA.
In each case the group of data named as “the training set” used for ANNs training concerns the time period 2001-2004. The group of data named as “the validation set” given to the network still in the learning phase accounts 20% of the training set for each one of the above ANNs. Finally “the test set” refers to the year 2005, which is absolutely unknown to the models in order to reveal the models forecasting ability. Table 9 presents the input and output data for the four developed ANNs. The combination of selected data for the appropriate ANN models training was done after a series of several tests (trial and error method). At the end, the combination that gave the best forecasting result in each case was selected (Table 9.).
|INPUT DATA (input layer)||ANN#7||ANN#8||ANN#9||ANN#10|
|Stations’ number (1,2,3,4,5,6,7)||√||√||√||√|
|The maximum (Tmax) daily temperature for the six previous days.||√||√||√|
|The maximum (RHmax) daily relative humidity for the six previous days.||√||√|
|The maximum (DImax) daily value of DI for the six previous days.||√||√|
|The daily number of consecutive hours with DI≥24 0C for the six previous days.||√||√|
|The maximum (Vmax) daily wind speed for the six previous days.||√|
|The minimum (CPmin) daily value of CP for the six previous days.||√|
|The daily number of consecutive hours with CP≤174 W/m2 for the six previous days.||√||√|
|The maximum (Tmax) and minimum (Tmin) daily temperature for the six previous days.||√|
|The maximum (Vmax) and minimum (Vmin) daily wind speed for the six previous days.||√|
|The maximum (CPmax) and minimum (CPmin) daily value of CP for the six previous days.||√|
|OUTPUT DATA (output layer)|
|The maximum (DImax) daily value of DI for the next day.||√|
|The minimum (CPmin) daily value of CP for the next day.||√|
|The daily number of consecutive hours with DI≥24 0C for the next day.||√|
|The daily number of consecutive hours with CP≤174 W/m2 for the next day.||√|
8.2. DI and CP daily value forecasting for the next day
The global fit agreement statistical indices as well as the excess statistical indices for the observed and predicted values were calculated and demonstrated for the eight examined stations respectively. More specifically, Oave, Pave, MBE, RMSE, IA and R2 values for DI are presented in Table 10.
The R2 values show a very satisfactory prediction for DI-ANN#7 (0.676 ≤ R2 ≤ 0.841) during the test year 2005 as well as for the CP-ANN#8 (0.591 ≤ R2 ≤ 0.814). Concerning the IA values, a very satisfactory prediction for DI-ANN#7 (0.849 ≤ IA ≤ 0.956) as well as for the CP-ANN#8 (0.813 ≤ IA ≤ 0.948) appears. Taking into consideration the persistence of the phenomenon with respect to the daily number of consecutive hours with high discomfort conditions, due to strong heat stress, it seems that ANN#9 and ANN#10 give an adequate prediction. Additionally, the R2 values show a very satisfactory prediction for ANN#9 (0.140
≤ R2 ≤ 0.832) as well as for the CP-ANN#10 (0.443 ≤ R2 ≤ 0.812) during the test year 2005. Besides, the IA values, show a very satisfactory prediction regarding ANN#9 (0.368 ≤ IA ≤ 0.951) and ANN#10 (0.750 ≤ IA ≤ 0.946). The worst prediction for the daily number of consecutive hours with high discomfort conditions, due to strong heat stress, refers to the region-station of THR (suburban region-station). This may be attributed to the fact that, in this suburban region (Thrakomakedones) the bioclimatic conditions are better than all the other examined regions within the GAA due to lower temperature values. Both discomfort indices, DI and CP, present daily values over their thresholds for a short period of time during the examined period. Thus, there is not a “memory-experience” of the persistence in THR, so the developed ANN models cannot have the appropriate training in order to forecast the number of consecutive hours with strong discomfort.
Figure 8. reveals that within the city’s centre (PAT), the strong discomfort conditions (DI ≥ 24 0C) appear from the end of June to the first half of September. At the suburban station (THR) there is not a significant discomfort, according to DI values. Just a few days during the warm period of the year appear to be over the threshold of DI ≥ 24 0C; namely at least 50% of the population feels discomfort due to heat stress.
Figure 9 illustrates that close to the city’s center (urban area of Galatsi), the hot sub comfort conditions according to CP values (CP ≤ 174 W/m2) appear from the middle of June until the first half of September. At the suburban station (THR), the discomfort due to heat stress conditions starts at the beginning of July until the middle of August. In all the above cases it seems that the prediction of bioclimatic conditions one day ahead with the use of ANN models is very satisfactory and realizable.
8.3. ANNs description for PET forecasting
Three developed ANNs were trained using back-propagation algorithm to forecast the mean daily PET value for the next day (ANN#11), two next days (ANN#12) and three next days (ANN#13). The training dataset concern the period 2001-2003, while the validation dataset concern the year 2004, which was absolutely unknown to the constructed model, in order to test the predictive ability of the model. Superposed epoch analysis on the training datasets indicated that three days before the incidence of strong heat/cold stress are adequate to forecast PET value for the next days. Thus, the input data (Table 11) which were taken for ANNs training concern the mean daily air temperature, relative humidity, wind speed and sunshine for the previous three days from the National Observatory of Athens.
Table 12 presents the fit agreement indices between the observed and the predicted PET values, for the validation year 2004. It is remarkable the high values of IA and R2, which indicate that the constructed ANNs have an excellent forecasting ability of PET for the next three days. This gives evidence that the developed ANNs, taking into account simple meteorological parameters recorded in the previous three days, are capable to predict a bioclimatic index, which is not easily calculated (PET was estimated using the RayMan model), while the most remarkable finding is that of pronounced agreement between observed and predicted PET values. Figure 9 depicts the predicted and observed mean daily PET time series for the next day (a), the next two days (b) and the next three days (c), along with the respective scatter plots.
|INPUT DATA (input layer)||ANN#11||ANN#12||ANN#13|
|Mean daily air temperature (0C) for the three previous days||√||√||√|
|Mean daily wind speed (m/s) for the three previous days||√||√||√|
|Mean daily relative humidity (RH%) for the three previous days||√|
|The sunshine duration (hours) for the three previous days||√||√||√|
|OUTPUT DATA (output layer)|
|Mean daily PET value for the next day||√|
|4Mean daily PET value for the next two days||√|
|Mean daily PET value for the next three days||√|
|Mean daily PET value for the next day (ANN#11)||+0.5||2.8||0.982||0.933|
|Mean daily PET value for the next two days (ANN#12)||+0.5||3.8||0.966||0.874|
|Mean daily PET value for the next three days (ANN#13)||+0.4||4.3||0.956||0.839|
9. Spatial distribution of air quality and bioclimatic conditions in the GAA
9.1. Spatial variation of air quality within GAA
The mean annual value for both air quality indices ERPI and DQAx was calculated at all the examined regions within GAA, during the time period 2001-2005. Figure 10 shows the spatial variation of air quality levels within GAA. As far as the air quality index ERPI is concerned, only the station THR appears a satisfactory air quality level in annual basis (ERPI < 40). The stations MAR, APA and GAL appear a tolerable air quality level (ERPI < 50). Moreover, the air quality levels at LYK, LIO and PAT stations are very close to the limit value of ERPI ≥ 50. Finally, the air quality level appears to be poor in the city centre station ARI. This may be attributed to the high PM10 concentration levels almost during the whole year. In this point, we have to mention that the station PAT is also in the centre of the city and very close to the ARI station, but unfortunately for this station we don’t have any PM10 observations. Similar conclusions are extracted with respect to the air quality index DAQx. The only exception is the LIO station in which the air quality levels seems to be much closer to the stations GAL, MAR and APA.
9.2. Spatial variation of bioclimatic conditions within GAA
During the period 2001-2005, the mean annual value for both bioclimatic indices DI and CP was calculated at all the examined regions within the GAA. Figure 11 depicts the spatial variation of the bioclimatic conditions within the GAA during the warm period of the year (May-September), where three different bioclimatic zones appear. The first zone is the north suburban zone (THR), which can be characterized as a comfortable zone. The second zone extends peripherally the city’s center (LIO, LYK, MAR and APA) and can be marginally characterized as a comfortable zone or warm zone. Finally, the third zone concerns the city’s center (GAL, PAT and GEO), which can be characterized as an uncomfortable zone or a strong heat stress zone.
As far as the persistence of discomfort during the examined period 2001-2005 is concerned, the mean seasonal number of consecutive hours during the day with high levels of human discomfort appears in the station PAT; 11.3 and 13.6 consecutive hours with respect to DI and CP, respectively, against 1.0 and 2.7 consecutive hours at the station THR, respectively. All the other examined regions-stations within the GAA present a bioclimatic behavior between PAT and THR. This means that for a given building within the city’s center region (PAT), we need 5 to 11 times more energy for cooling during the warm period of the year than the energy for cooling at the north suburban area (THR).
In this study an application, which concerns the development and the use of ANN models on environmental issues and generally in environmental management, is presented. A number of ANN models have been developed and trained in order to forecast the air quality levels, as well as the bioclimatic conditions in different regions within the GAA. The findings of this work appoint the ANN models forecasting capacity.
The Results showed that the use of ANN models as forecasting tool is realizable and satisfactory at a statistically significant level of p<0.01. In particular for the air quality forecasting for the next day, the R2 values ranged between 0.381 and 0.826 (ERPI) and between 0.378 and 0.686 (DAQx). Besides, the IA index between the predicted and observed values ranged between 0.717and 0.937 for ERPI forecasting, while it ranged between 0.746 and 0.889 for DAQx forecasting. It seems that in all cases, the air quality forecasting is more sufficient using the ERPI air quality index than the DAQx. In this point we have to mention that the ERPI is according to the European Community directives for the air quality levels. The same results are extracted regarding the forecasting of the persistence of air pollution episodes and especially the number of consecutive hours during the day with poor air quality.
Concerning the forecasting of bioclimatic conditions for the next day, the R2 values ranged between 0.676 and 0.841 for DI and between 0.591 and 0.814 for CP. The IA values ranged between 0.849 and 0.956 for DI and between 0.813 and 0.948 for CP. Taking into account the persistence of the phenomenon (the number of consecutive hours during the day with high discomfort conditions due to strong heat stress), it seems that ANN#9 (consecutive discomfort hours according to DI values) and ANN#10 (consecutive discomfort hours according to CP values) give an adequate prediction.
A remarkable finding of this research is that the high values of IA (0.956 – 0.982) and R2 (0.839 – 0.933) with respect to PET forecasting for the next three days indicate that the constructed ANNs have an excellent forecasting ability of PET, a more complex bioclimatic index based on the human energy balance. This gives evidence that the developed ANNs, taking into account simple meteorological parameters recorded in the previous three days, are capable to predict a bioclimatic index, which is not easily calculated (PET was estimated using the RayMan model).