Developing Neural Networks to Investigate Relationships Between Air Quality and Quality of Life Indicators

Quality of life (QOL) is an integral outcome measure in the management of diseases. It can be used to assess the results of different management methods, in relation to disease complications and in fine-tuning management methods (Koller & Lorenz, 2003). Quantitative analysis of quality of life across countries, and the construction of summary indices for such analyses have been of interest for some time (Slottje et al., 1991). Most early work focused on largely single dimensional analysis based on such indicators as per capita GDP, the literacy rate, and mortality rates. Maasoumi (1998) and others called for a multidimensional quantitative study of welfare and quality of life. The argument is that welfare is made up of several distinct dimensions, which cannot all be monetized, and heterogeneity complications are best accommodated in multidimensional analysis. Hirschberg et al. (1991) and Hirschberg et al. (1998) identified similar indicators, and collected them into distinct clusters which could represent the dimensions worthy of distinct treatment in multidimensional frameworks.


Introduction
Quality of life (QOL) is an integral outcome measure in the management of diseases.It can be used to assess the results of different management methods, in relation to disease complications and in fine-tuning management methods (Koller & Lorenz, 2003).Quantitative analysis of quality of life across countries, and the construction of summary indices for such analyses have been of interest for some time (Slottje et al., 1991).Most early work focused on largely single dimensional analysis based on such indicators as per capita GDP, the literacy rate, and mortality rates.Maasoumi (1998) and others called for a multidimensional quantitative study of welfare and quality of life.The argument is that welfare is made up of several distinct dimensions, which cannot all be monetized, and heterogeneity complications are best accommodated in multidimensional analysis.Hirschberg et al. (1991) and Hirschberg et al. (1998) identified similar indicators, and collected them into distinct clusters which could represent the dimensions worthy of distinct treatment in multidimensional frameworks.
In this research effort we have considered the role of air quality indicators in the context of economic and welfare life quality indicators, using artificial neural networks (ANN).Therefore in this presentation we have obtained the key variables (life expectancy, healthy life years, infant mortality, Gross Domestic Product (GDP) and GDP growth rate) and developed a neural network model to predict the air quality outcomes (emissions of sulphur and nitrogen oxides).Sustainability and quality of life indicators have been proposed recently by Flynn et al. (2002) and life quality indices have been used to estimate willingness to pay (Pandey & Nathwani, 2004).The innovative part of this research effort lies in the use of a soft computing machine learning approach like the ANN to predict air quality.In this way, we introduce the reader to a technique that allows the comparison of various attributes that impact the quality of life in a meaningful way.

Materials and methods
It is well known that the quality of the air in a locale influences the health of the population and ultimately affects other dimensions of that population's welfare and its economy.As a simple example, in cities where pollution levels rise significantly in the summer, worker absenteeism rates rise commensurately and productivity is adversely impacted.Other dimensions of the economy are influenced on "high pollution days" as well.For example, when outdoor leisure activity is restricted this may have serious consequences for the service sector of the economy (Bresnahan et al., 1997).In this chapter, we have introduced two measures of environmental quality or air quality as quality of life factors.A feature of these indices is the fact that these types of pollution are created by some of the very activities that define economic development.The two factors under investigation here are sulfur oxides (SOx) and nitrogen oxides (NOx) (million tones of SO 2 and NO 2 equivalent, respectively).Sulphur oxides, including sulphur dioxide and sulphur trioxide, are reported as sulphur dioxide equivalent, while nitrogen oxides, including nitric oxide and nitrogen dioxide, are reported as nitrogen dioxide equivalent.They are both produced as byproducts of fuel consumption as in case of the generation of electricity.Vehicle engines also produce a large proportion of NOx.SOx is primarily produced when high sulphur content coal is burned which is usually in large-scale industrial processes and power generation.Thus, the ratio of these emissions to the population is an indication of pollution control.
The following attributes of QOL have been used:


Life expectancy at birth: The mean number of years that a newborn child can expect to live if subjected throughout his life to the current mortality conditions (age specific probabilities of dying).


Healthy life years: The indicator Healthy Life Years (HLY) at birth measures the number of years that a person at birth is still expected to live in a healthy condition.HLY is a health expectancy indicator which combines information on mortality and morbidity.The data required are the age-specific prevalence (proportions) of the population in healthy and unhealthy conditions and age-specific mortality information.
A healthy condition is defined by the absence of limitations in functioning/disability.The indicator is also called disability-free life expectancy (DFLE).Life expectancy at birth is defined as the mean number of years still to be lived by a person at birth, if subjected throughout the rest of his or her life to the current mortality conditions (WHO, 2010).


Infant mortality: The ratio of the number of deaths of children under one year of age during the year to the number of live births in that year.The value is expressed per 1000 live births. Gross Domestic Product (GDP) per capita: GDP is a measure of the economic activity, defined as the value of all goods and services produced less the value of any goods or services used in their creation.These amounts are expressed in PPS (Purchasing Power Standards), i.e. a common currency that eliminates the differences in price levels between countries allowing meaningful volume comparisons of GDP between countries.Table 1.Descriptive statistics for all variables used in the analysis.
For the performance of the analyses, multi-layer perceptron (MLP) and radial-basis function (RBF) network models were developed under the SPSS v.19 statistical package (IBM, 2010).We specified that the relative number of cases assigned to the training:testing:holdout samples should be 6:2:1.This assigned 2/3 of the cases to training, 2/9 to testing, and 1/9 to holdout.For the MLP network we employed the back propagation (BP) optimization algorithm.As it is well known in BP the weighted sum of inputs and bias term are passed to the activation level through the transfer function to produce the output (Bishop, 1995;Fine, 1999;Haykin, 1998;Ripley, 1996).The sigmoid transfer function was employed (Callan, 1999;Kecman, 2001), due to the fact that the algorithm requires a response function with a continuous, single valued with first derivative existence (Picton, 2000).
Before using the input data records to the ANN a normalization process took place so that the values with wide range do not prevail over the rest.The autoscaling approach was applied.This method outputs a zero mean and unit variance of any descriptor variable (Dogra, Shaillay, 2010).Thus, each feature's values were normalized based on the following equation: 248 where X i was the ith parameter, Z i was the scaled variable following a normal distribution and σ i , μ i were the standard deviation and the mean value of the ith parameter.
These networks were trained in an iterative process.A single hidden sub layer architecture was followed in order to reduce the complexity of the network, and increase the computational efficiency (Haykin, 1998).Two units were chosen in the hidden layer.The schematic representation of the neural network is given in Fig. 1.
The transfer functions (hidden layer activation functions and output function) determine the output by depicting the result of the distance function (Bors & Pitas, 2001;Iliadis, 2007).The schematic representation of the neural network with transfer functions is given in Fig. 2.

Results -Discussion
From the MLP analysis, 19 cases (70.4%) were assigned to the training sample, 2 (7.4%) to the testing sample, and 6 (22.2%) to the holdout sample.The choice of the records was done in a random manner.The whole effort targeted in the development of an ANN that would have the ability to generalize as much as possible.The seven data records which were excluded from the analysis were countries that did not had available data on Healthy Life Years.Two units were chosen in the hidden layer.
Table 2 displays information about the results of training and applying the MLP network to the holdout sample.Sum-of-squares error is displayed because the output layer has scaledependent variables.This is the error function that the network tries to minimize during training.One consecutive step with no decrease in error was used as stopping rule.The relative error for each scale-dependent variable is the ratio of the sum-of-squares error for the dependent variable to the sum-of-squares error for the "null" model, in which the mean value of the dependent variable is used as the predicted value for each case.There appears to be more error in the predictions of emissions of sulphur oxides than in emissions of nitrogen oxides, in the training and holdout samples.
The average overall relative errors are fairly constant across the training (0.779), testing (0.615), and holdout (0.584) samples, which give us some confidence that the model is not overtrained and that the error in future cases, scored by the network will be close to the error reported in this table   4), respectively.There appears to be more error in the predictions of emissions of sulphur oxides than in emissions of nitrogen oxides, something that we also pointed out in Table 2.The importance of an independent variable is a measure of how much the network's modelpredicted value changes for different values of the independent variable.A sensitivity analysis to compute the importance of each predictor is applied.The importance chart (Fig. 5) shows Fig. 5. MLP independent variable importance chart.

www.intechopen.com
Developing Neural Networks to Investigate Relationships Between Air Quality and Quality of Life Indicators 253 that the results are dominated by GDP growth rate and GDP (strictly economical QOL indicators), followed distantly by other predictors.
From the RBF analysis, 19 cases (70.4%) were assigned to the training sample, 1 (3.7%) to the testing sample, and 7 (25.9%) to the holdout sample.The seven data records which were excluded from the MLP analysis were excluded from the RBF analysis also, for the same reason.
Table 4 displays the corresponding information from the RBF network.There appears to be more error in the predictions of emissions of sulphur oxides than in emissions of nitrogen oxides, in the training and holdout samples.
The difference between the average overall relative errors of the training (0.132), and holdout (1.325) samples, must be due to the small data set available, which naturally limits the possible degree of complexity of the model (Dendek & Mańdziuk, 2008)   Finally, the importance chart for the RBF network (Fig. 8) shows that, once again, GDP growth rate and GDP are the most important predictors of sulphur and nitrogen oxides emissions.

Conclusions
The multi-layer perceptron and radial-basis function neural network models, that were trained to predict air quality indicators, using life quality and welfare indicators, appear to perform reasonably well.Unlike traditional statistical methods, the neural network models provide dynamic output as further data is fed to it, while they do not require performing and analyzing sophisticated statistical methods (Narasinga Rao et al., 2010).
showed that GDP growth rate and GDP influenced mainly air quality predictions, while life expectancy, infant mortality and healthy life years followed distantly.One possible way to ameliorate performance of the network would be to create multiple networks.One network would predict the country result, perhaps simply whether the country increased emissions or not, and then separate networks would predict emissions conditional on whether the country increased emissions.We could then combine the network results to likely obtain better predictions.Note also that neural network is open ended; as more data is given to the model, the prediction would become more reliable.Overall, we find that predictors that include economic indices may be employed by investigators to represent dimensions of air quality that include, as well as go beyond, these simple indices.
the MLP network does a reasonably good job of predicting emissions of sulphur and nitrogen oxides.Ideally, linear regression parameters a and b should have values 0 and 1, respectively, while values of the observed-by-predicted chart should lie roughly along a straight line.Linear regression gave results for the two output Figs 3 and 4 actually seem to suggest that the largest errors of the ANN are overestimations of the target values.

Fig. 3 .
Fig. 3. Linear regression of observed values for emissions of sulphur oxides by predicted values of MLP.

Fig. 4 .
Fig. 4. Linear regression of observed values for emissions of nitrogen oxides by predicted values of MLP.

Fig. 6 .Fig. 7 .
Fig. 6.Linear regression of observed values for emissions of sulphur oxides by predicted values of RBF.
Developing Neural Networks to Investigate Relationships Between Air Quality and Quality of Life Indicators 247  GDP growth rate: The calculation of the annual growth rate of GDP volume is intended to allow comparisons of the dynamics of economic development both over time and between economies of different sizes.For measuring the growth rate of GDP in terms of volumes, the GDP at current prices are valued in the prices of the previous year and the thus computed volume changes are imposed on the level of a reference year; this is called a chain-linked series.Accordingly, price movements will not inflate the growth rate.Data were extracted for 34 European countries, for the year 2005, from the Eurostat database (Eurostat, 2010).Descriptive statistics for all variables are given in Table 1.
www.intechopen.com*Number of observations (countries) for each variable.**Number of countries that didn't had available data.
. Cannot be computed.The dependent variable may be constant in the training sample. a

Table 4 .
RBF Model Summary.In Table5parameter estimates for input and output layer are given for the RBF network.