Open access peer-reviewed chapter

Advances in Airborne Pollution Forecasting Using Soft Computing Techniques

By Aceves-Fernandez Marco Antonio, Sotomayor-Olmedo Artemio, Gorrostieta-Hurtado Efren, Pedraza-Ortega Jesus Carlos, Ramos-Arreguín Juan Manuel, Canchola-Magdaleno Sandra and Vargas-Soto Emilio

Submitted: October 7th 2010Reviewed: March 17th 2011Published: July 5th 2011

DOI: 10.5772/16273

Downloaded: 2139

1. Introduction

There are many investigations reported in the scientific literature about Particulate Matter (PM) 2.5 and PM10 in urban and suburban environments [Vega et al 2002, Querolet al 2004, Fulleret al 2004].

In this contribution, the information acquired from PMx monitoring systems is used to accurately forecast particle concentration using diverse soft computing techniques.

A number of works have been published in the area of airborne particulates forecasting. For example, Chelani[et al 2001] trained hidden layer neural networks for CO forecasting at India. Caselli [et al 2009] used a feedforward neural network to predict PM10 concentration. Other works such as Kurt’s [et al 2010] have constructed a neural networks model using many input variables (e.g. wind, temperature, pressure, day of the week, Date, concentration, etc) making the model too complex and inaccurate.

However, not many scientific literature discuss a number of robust forecasting methods using soft computing techniques. These techniques include neuro-fuzzy inference methods, fuzzy clustering techniques and support vector machines. Each one of these algorithms is discussed separately and the results discussed. Furthermore, a comparison of all methods is made to emphasize their advantages as well as their disadvantages.

2. Fuzzy inference methods

Fuzzy inference systems (FIS) are also known as fuzzy rule-based systems. This is a majorunit of a fuzzy logic system. The decision-making is an important part in the entire system. The FIS formulates suitable rules and based upon the rules the decision is made. This is mainly based on the concepts of the fuzzy set theory, fuzzy IF–THEN rules, and fuzzy reasoning. FIS uses “IF - THEN” statements, and the connectors present in the rule statement are “OR” or “AND” to make the necessary decision rules.

Fuzzy inference system consists of a fuzzification interface, a rule base, a database, a decision-making unit, and finally a defuzzification interface as described in Chang(et al 2006). A FIS with five functional block described in Fig.1.

Figure 1.

Fuzzy Inference System

The function of each block is as follows:

  • A rule base containing a number of fuzzy IF–THEN rules;

  • A database which defines the membership functions of the fuzzy sets used in the fuzzy rules;

  • A decision-making unit which performs the inference operations on the rules;

  • A fuzzification interface which transforms the crisp inputs into degrees of match with linguistic values; and

  • A defuzzification interface which transforms the fuzzy results of the inference into a crisp output.

The working of FIS is as follows. The inputs are converted in to fuzzy by using fuzzification method. After fuzzification the rule base is formed. The rule base and the database are jointly referred to as the knowledge base.

Defuzzification is used to convert fuzzy value to the real world value which is the output.

The steps of fuzzy reasoning (inference operations upon fuzzy IF–THEN rules) performed by FISs are:

  • Compare the input variables with the membership functions on the antecedent part to obtain the membership values of each linguistic label. (this step is often called fuzzification.)

  • Combine (through a specific t-norm operator, usually multiplication or min) the membership values on the premise part to get firing strength (weight) of each rule.

  • Generate the qualified consequents (either fuzzy or crisp) or each rule depending on the firing strength.

  • Aggregate the qualified consequents to produce a crisp output. (This step is called defuzzification.)

A typical fuzzy rule in a fuzzy model has the format shown in equation 1

IFxisAandyisBTHENz=f(x, y)E1

where AB are fuzzy sets in the antecedent; Z = f(x, y) is a function in the consequent. Usually f(x, y) is a polynomial in the input variables x and y, of the output of the system within the fuzzy region specified by the antecedent of the rule.

A typical rule in a FIS model has the form (Sugenoet al1988): IF Input 1 = x AND Input 2 = y, THEN Output is z = ax + by + c.

Furthermore, the final output of the system is the weighted average of all rule outputs, computed as

FinalOutput=i=1Nwizii=1NwiE2

3. Fuzzy clustering techniques

There are a number of fuzzy clustering techniques available. In this work, two fuzzy clustering methods have been chosen: fuzzy c-means clustering and fuzzy clustering subtractive algorithms. These methods are proven to be the most reliable fuzzy clustering methods as well as better forecasters in terms of absolute error according to some authors[Sin, Gomez, Chiu].

Since 1985 when the fuzzy model methodology suggested by Takagi-Sugeno [Takagi et al 1985, Sugeno et al 1988], as well known as the TSK model, has been widely applied on theoretical analysis, control applications and fuzzy modelling.

Fuzzy system needs the precedent and consequence to express the logical connection between the input output datasets that are used as a basis to produce the desired system behavior [Sin et al 1993].

3.1. Fuzzy clustering means (FCM)

Fuzzy C-Means clustering (FCM) is an iterative optimization algorithm that minimizes the cost function given by:

J=k=1ni=1cμikmxkvi2E3

Where n is the number of data points, c is the number of clusters, xk is the kth data point, vi is the ith cluster center μik is the degree of membership of the kth data in the ith cluster, and m is a constant greater than 1 (typically m=2)[Aceveset al 2011]. The degree of membership μik is defined by:

μik=1j=1c(xkvixkvj)2(m1)E4

Starting with a desired number of clusters c and an initial guess for each cluster center vi, i = 1,2,3… c, FCM will converge to a solution for vi that represents either a local minimum or a saddle point cost function [Bezdeket al 1985]. The FCM method utilizes fuzzy partitioning such that each point can belong to several clusters with membership values between 0 and 1. FCM include predefined parameters such as the weighting exponent m and the number of clusters c.

3.2. Fuzzy clustering subtractive

The subtractive clustering method assumes each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center, based on the density of surrounding data points. Consider m dimensions of n data point (x1,x2, …, xn) and each data point is potential cluster center, the density function Di of data point at xi is given by:

Di=i=1ne(xixj2(ra2)2)E5

wherera is a positive number. The data point with the highest potential is surrounded by more data points. A radius defines a neighbour area, then the data points, which exceed ra, have no influence on the density of data point.

After calculating the density function of each data point is possible to select the data point with the highest potential and find the first cluster center. Assuming that Xc1 is selected and Dc1 is its density, the density of each data point can be amended by:

Di=DiDc1e(xixc12(rb2)2)E6

The density function of data point which is close to the first cluster center is reduced. Therefore, these data points cannot become the next cluster center. rb defines an neighbour area where the density function of data point is reduced. Usually constant rb>ra. In order to avoid the overlapping of cluster centers near to other(s) is given by [Yageret al 1994]:

rb=ηraE7

4. Support vector machines

The support vector machines (SVM) theory, was developed by Vapnik in 1995, and is applied in many machine-learning applications such as object classification, time series prediction, regression analysis and pattern recognition. Support vector machines (SVM) are based on the principle of structured risk minimization (SRM) [Vapniket al 1995, 1997].

In the analysis using SVM, the main idea is to map the original data x into a feature space F with higher dimensionality via non-linear mapping function, which is generally unknown, and then carry on linear regression in the feature space [Vapnik 1995]. Thus, the regression approximation addresses a problem of estimating function based on a given data set, which is produced from the function. SVM method approximates the function by:

y=i=1mwiϕi(x)+b=wϕ(x)+bE8

wherew = [w1,…,wm] represent the weights vector, b is defined as the bias coefficients and (x)=[1(x),…,m(x)] the basis function vector.

The learning task is transformed to the weights of the network at minimum. The error function is defined through the ε-insensitive loss function, Lε(d,y(x)) and is given by:

Lε(d,y(x))={|dy(x)|ε0|dy(x)|εothersE9

The solution of the so defined optimization problem is solved by the introduction of the Lagrange multipliers αi, αi*(where i=1,2,…,k) responsible for the functional constraints defined in Eq. 9. The minimization of the Lagrange function has been changed to the dual problem [Vapnik et al 1997]:

ϕ(α,α)=[i=1kdi(αiαi)εi=1k(αiαi)12εi=1kj=1k(αi,αi)(αj,αj)K(xi,xj)]E10

With constraints:

i=1k(αi,α*i)=0,0αiC,0αi*CE11

Where C is a regularized constant that determines the trade-off between the training risk and the model uniformity.

According to the nature of quadratic programming, only those data corresponding to non-zero αiαi*pairs can be referred to support vectors (nsv). In Eq. 10K(xi, xj)=(xi)*(xj) is the inner product kernel which satisfy Mercer’s condition [Osunaet al 1997] that is required for the generation of kernel functions given by:

K(xi,xj)=ϕ(xi),ϕ(xj)E12

Thus, the support vectors associates with the desired outputs y(x) and with the input training data x can be defined by:

y(x)=i=1Nsv(αi,αi)K(x,xi)+bE13

Where xiare learning vectors. This leads to a SVM architecture (Fig. 2) [Vapnik 1997, Cristianiniet al 2000].

Figure 2.

Support Vector Machine Architecture.

Figure 3.

Support Vector Machine Methodology.

The methodology used for the design, training and testing of SVM is proposed as follows based in a review of Vapnik, Osowski [et al 2007] and Sapankevych[et al 2009]

  • Preprocess the input data and select the most relevant features, scale the data in the range [−1, 1], and check for possible outliers.

  • Select an appropriate kernel function that determines the hypothesis space of the decision and regression function.

  • Select the parameters of the kernel function the variances of the Gaussian kernels.

  • Choose the penalty factor C and the desired accuracy by defining the ε-insensitive loss function.

  • Validate the model obtained on some previously, during the training, unseen test data, and if not pleased iterate between steps (c) (or, eventually b) and (e).

5. Discussion of results

Simulations were performed using fuzzy clustering algorithms using the equations [3-7], in this case study, the datasets at Mexico City in 2007 were chosen to construct the fuzzy model. Likewise, the data of 2008 and 2009 from the same geographic zone in each case were used to training and validating the data, respectively. The result of the fuzzy clustering model was compared then to the real data of Northwest Mexico in 2010.

The results obtained show an average least mean square error of 11.636 using Fuzzy Clustering Means, whilst FCS shows an average least mean square error of 10.59. Table 1 shows a list of the experiments carried out. An example of these results is shown in figure 4 for FCM and figure 5 shows the estimation made using FCS at Northwest Mexico City.

Figure 4.

Fuzzy Clustering Means (FCM) Results at Northwest Mexico City.Raw Data VS. Fuzzy Model

Figure 5.

Fuzzy Clustering Subtractive (FCS) Results at Northwest Mexico City.Raw Data VS. Fuzzy Model

In figures 4 and 5, the raw data (shown in blue solid line) and the constructed fuzzy model (in dashed-starred green line) shown that the trained model is approximated to the raw data with an average least mean square error of 8.7%, implying that a fuzzy model can be accurately constructed using this technique.

SiteLMSE using FCMLMSE using FCS
Northwest10.19177.4807
Northeast13.628213.7374
Center18.575715.1409
Southwest5.04117.4953
Southeast10.74289.1188

Table 1.

List of the experiments carried out using FCM and FCS.

In table 1 is shown that the best prediction in terms of error percentage is given at southwest for both fuzzy clustering means and fuzzy clustering subtractive, whilst the lessen estimation is given at the city center. This may be due to the high variations in terms of PM10 particles making it more difficult to predict. However, more research is needed to confirm this.

Furthermore, detailed simulations were carried out using Support Vector Machines following the proposed methodology shown in figure 3. These simulations were carried out using the same dataset as the fuzzy clustering technique. In this case, values 2 σ was chosen, and an ε of 11 and 13 were chosen since it was demonstrated to give better results in previous contributions (Sotomayor et al 2010, Sotomayor et al 2011). Figure 6 shows the results of the model using support vector machines with a Gaussian kernel, whilst figure 7 shows the results using the same datasets, with polynomial kernel

Figure 6.

SVM Results at Northwest Mexico City using Gaussian Kernel. a)SVM Estimated with free parameters of ε = 13 and σ = 2; b)SVM Estimated with free parameters of ε = 11 and σ = 2

Figure 6 indicates a summary of the results with the Support vector machine (in red circles), the raw data (black cross) and the behavior of the data (solid black line). These results show that for Gaussian Kernel (fig 6) gives 11.8 error using the same LMSE Algorithm than the fuzzy model with an epsilon of 13 giving a total number of support vector machines of 157. In the case of figure 5b, using the Gaussian kernel, it was also used the same σ and an epsilon of 11. For this figure, the support vector shows an improvement by having an LMSE of 8.7.

Figure 7.

SVM Results at Northwest Mexico City using Polynomial Kernel. a)SVM Estimated with free parameters of ε = 13 and σ = 2; b)SVM Estimated with free parameters of ε = 11 and σ = 2

For figure 7a, the estimation gives an error of 9.8 using an σ of 2 and an epsilon of 11 using 177 support vector machines. Likewise, figure 7b also shows the estimation using a third degree polynomial kernel with anε of 13. In this case, a 10.1 LMSE is shown by having 183 support vector machines.

6. Conclusions and further work

An assessment in the performance of both fuzzy systems generated using Fuzzy Clustering Subtractive and Fuzzy C-Means was made taking in account the number or membership functions, rules, and Least Mean Square Error for PM10 particles. As a case study, Estimations were made at Northwest Mexico City in 2010, giving consistent results.

In case of SVMs, it can be concluded that for this case study an ε of 11 gives a better estimation than an ε of 13 for the Gaussian kernel. In general, the Gaussian kernel gives better results in terms of estimation than its corresponding polynomial kernel. In general terms, fuzzy clustering gives a better estimation than Gaussian and polynomial kernels, although in-depth studies are needed to corroborate these results for other scenarios.

For future work, more SVM kernels can be implemented and comparison can be made to find out which kernels give better estimation. Also, SVMs can be implemented along with other techniques such as wavelet transform to improve the performance of these algorithms

© 2011 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike-3.0 License, which permits use, distribution and reproduction for non-commercial purposes, provided the original is properly cited and derivative works building on this content are distributed under the same license.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Aceves-Fernandez Marco Antonio, Sotomayor-Olmedo Artemio, Gorrostieta-Hurtado Efren, Pedraza-Ortega Jesus Carlos, Ramos-Arreguín Juan Manuel, Canchola-Magdaleno Sandra and Vargas-Soto Emilio (July 5th 2011). Advances in Airborne Pollution Forecasting Using Soft Computing Techniques, Air Quality - Models and Applications, Dragana Popovi?, IntechOpen, DOI: 10.5772/16273. Available from:

chapter statistics

2139total chapter downloads

1Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Urban Air Pollution Modeling

By Anjali Srivastava and B. Padma S. Rao

Related Book

First chapter

Observational Study of Black Carbon in the North Suburb of Nanjing, China

By Lili Tang, Shengjie Niu, Mingliang Yan, Xuwen Li, Xiangzhi Zhang, Yuan Zhu, Honglei Shen, Minjun Xu and Lei Tang

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us