Strategies for Estimation of Gas Mass Flux Rate Between Surface and the Atmosphere

two different strategies for addressing the inverse problem: formulated as an optimization problem, and neural network. The optimization problem has been solved trans-boundary environmental High concentrations of air pollutants due to numerous anthropogenic activities influence the air quality. the area of air quality monitoring, modelling, and in with various monitoring techniques of and control. case studies describing the exposure and health implications of air pollutants on living biota in


Introduction
A relevant issue nowadays is the monitoring and identification of the concentration and rate flux of the gases from the greenhouse effect. Most of these minority gases belong to important bio-geochemical cycles between the planet surface and the atmosphere. Therefore, there is an intense research agenda on this topic. Here, we are going to describe the effort for addressing this challenging. This identification problem can be formulated as an inverse problem. The problem for identifying the minority gas emission rate for the system ground-atmosphere is an important issue for the bio-geochemical cycle, and it has being intensively investigated. This inverse problem has been solved using regularized solutions (Kasibhatla, 2000), Bayes estimation (Enting, 2002;Gimson & Uliasz, 2003), and variational methods (Elbern et al., 2007) -the latter approach coming from the data assimilation studies.
The inverse solution could be computed by calculating the regularized solutions. Regularization is a general mathematical procedure for dealing with inverse problems, looking for the smoothest (regular) inverse solution. For this approach, the inverse problem can be formulated as an optimization problem with constrains (a priori information). These constrains can be added to the objective function with the help of a regularization parameter. For adding a regularization operator, the ill-posed inverse problem becomes in a well-posed one. The maximum second order entropy principle was used as a regularization operator (Ramos et al., 1999), and the regularization parameter is found by the L-curve technique. A recent approach for solving inverse problems is provided by the use of artificial neural networks (ANN). Neural networks are non-linear mappings, and it is possible to design an ANN to be one inverse operator, with robustness to deal on noisy data.
We have applied two different strategies for addressing the inverse problem: formulated as an optimization problem, and neural network. The optimization problem has been solved swarm intelligence, artificial immune systems and others. Different from other methodologies for computing inverse solutions, neural networks do not need the knowledge on forward problem. In other words, neural networks can be used as inversion operator without a mathematical model to describe the direct problem. Several architectures can be adopted to implement a neural network, here will be adopted the multi-layer perceptron with back-propagation algorithm for learning process. Other approaches to implement artificial neural network for inverse problem are described by Campos Velho (2011).

Regularization theory
Regularization is recognized as the first general method for solving inverse problems, where the regularization operator is a constrain in an optimization problem. Such operator is added to the objective function with the help of a Lagrangian multiplier (also named the regularization parameter). The regularization represents an a priori knowledge from the physical problem on the unknown function to be estimated. The regularization operator moves the original ill-posed problem to a well posed one. Figure 1 gives an outline of the idea behind the regularization scheme, one of most powerful techniques for computing inverse solutions. The regularization procedure searches for solutions that display global regularity. In the mathematical formulation of the method, the inverse problem is expressed as optimization problem with constraint: represents the forward problem, and [] x  is the regularization operator (Tikhonov & Arsenin, 1977). The problem (1) can be written as an optimization problem without constrains with the help of a Lagrange multiplier (penalty or regularization parameter): being  the regularization parameter. The first term in the objective function (2) is the fidelity of the model with the observation data, while the second term expresses the regularity (or smoothness) required from the unknown quantity. Note that for   0, the fidelity term is overestimated; on the other hand, for    all information in the mathematical model is lost. A definition of a family of regularization operator is given below.
Definition 1: A family of continuous regularization operators : is called a regularization scheme for the inverse operation of ( ) where x  X, and  is regularization parameter.
The expression (2) is a practical implementation of the Definition 1. Several regularizations operators have been derived from the pioneer works. Here, only one class of these regularization operators will be described.

Entropic regularization
The maximum entropy (MaxEnt) principle was firstly proposed as a general inference procedure by Jaynes (1957), on the basis of Shannon's axiomatic characterization of the amount of information (Shannon & Weaver, 1949). It has emerged at the end of the 60's as a highly successful regularization technique. Similar to others regularization techniques, MaxEnt searches for solutions with global regularity. Employing a suitable choice of the penalty or regularization parameter, MaxEnt regularization identifies the smoothest reconstructions which are consistent with the measured data. The MaxEnt principle has successfully been applied to a variety of fields: pattern recognition (Fleisher et al., 1990), computerized tomography (Smith et al., 1991), crystallography (de Boissieu et al., 1991), non-destructive testing (Ramos & Giovannini, 1995), magnetotelluric inversion (Campos Velho & Ramos, 1997).
For the vector of parameters x i with nonnegative components, the discrete entropy function S of vector u is defined by The (nonnegative) entropy function S attains its global maximum when all s q are the same, which corresponds to a uniform distribution with a value of max log SN  , while the lowest entropy level, min 0 S  , is attained when all elements s q but one are set to zero.
Following the Tikhonov regularization, new entropic higher order regularization techniques have been introduced: MinEnt-1 -applied to identify the 2D electric conductivity from electro-magnetic data (Campos Velho & Ramos, 1997), MaxEnt-2 -employed to the retrieval of vertical profiles of temperature in the atmosphere from remote sensing data (Ramos et al., 1999). They represent a generalization of the standard MaxEnt regularization method, allowing a greater flexibility to introduce information about the expected shape of the true physical model, or its derivatives, for the inverse solution.
For defining higher order entropy functions, two approaches can be derived from the maximization of the entropy of the vector of first-and second-differences x. Under the assumption x min < x i < x max , (i=1, …, N), the methods of maximum entropy of order 1 (MaxEnt-1) and maximum entropy of order 2 (MaxEnt-2) are defined below (Muniz et al., 2000): where  is a small parameter ( = 10 -10 ). The argument in the Eq. (4) is to be considered with The non-extensive formulation for the entropy proposed by Tsallis (Tsallis, 1988) can be used to unify the Tikhonov and entropic regularizations (Campos Velho et al., 2006, see also Campos Velho, 2011. The Tsallis' non-extensive entropy is expressed as The non-extensive parameter q plays a central role in the Tsallis' thermostatistics. The Boltzmann-Gibbs-Shannon's entropy is recovered setting q=1. However, for q=2, the maximum non-extensive entropy principle is equivalent to the standard Tikhonov regularization. Of course, for q  1 and 2, we have a new regularization operator. In context of inverse problem, the Boltzmann constant (k) is assumed to be equal to 1.
In order to have a complete theory, the regularization operator should be known, and it is also necessary to have a scheme to compute the regularization parameter. Several methods have been developed to identify this parameter: Morosov's discrepancy criterion, Hansen's method (L-curve method), and generalized cross validation are the most used schemes.
The Morosov's criterion is based on the difference between data of the mathematical model and observations. It should have the same magnitude as measurement errors (Morosov, 1984; see also Morosov & Stessin, 1992). Therefore, if  is the error in the measure process,  is the root of the following equation: For determining N parameter in an inverse problem, and assuming that measurement errors could be modeled by a Gaussian distribution with  2 variance, the discrepancy criterion for independent measures could expressed as: The discrepancy principle can also be applied to the higher entropy regularization (Muniz et al., 2000). However, if the probability density function of measurement errors follows a distribution where the second statistical moment is not defined (Lévy or Cauchy distributions, for example) the generalized Morosov's discrepancy principle can be used (Shiguemori et al., 2004).
If the statistics on the observational data is not available, the generalized cross-validation method can be applied (Bertero & Bocacci, 1998;Aster et al., 2005). Considering now a linear forward problem: A(x)Ax, being A a matrix, the goal of the cross-validation scheme is to minimize the generalized cross-validation function (Aster et al., 2005): being Tr{C} the trace of matrix C, and () B  is the following matrix: where A * is the adjoint matrix: and I is the identity matrix.
Another scheme to compute the regularization parameter is the L-curve method. The Lcurve criterion is a geometrical approach suggested by Hansen (1992) (see also Bertero & Bocacci, 1998;Aster et al., 2005). The idea is to find the point of maximum curvature, on the corner of the plot: . In general, the plot smoothness × fidelity shows a Lshape curve type.

Optimization methods
This is a central issue in science and engineering, where several methods have been developed. They can classified as a deterministic and stochastic schemes. Here, only the methods used in the application treated will be described.

Quasi-Newton optimization
The minimization of the objective function J(x) given by equation (2), subjected to simple bounds on x, is solved using a first-order optimization algorithm from the E04UCF routine, NAG Fortran Library (1993). This routine is designed to minimize an arbitrary smooth function subject to constraints (simple bounds, linear and nonlinear constraints), using a sequential programming method. For the n-th iteration, the calculation proceeds as follows: where  specifies the accuracy to which one wishes to approximate the solution of the problem. Otherwise, return to step 1.

Particle swarm optimization
The particle swarm optimization (PSO) is a heuristic search method the uses the behavior of biological collective behavior, like birds, fishes or bees to drive the artificial particles in a search pattern that equalizes a global search with a local search, in a way that an optimum, or near to optimum, result is found (Kennedy & Eberhart, 1995, 2001. There is a sociocognitive theory for supporting the particle swarm scheme. The cultural adaptive process encloses two components: high-level (pattern formation trough the individual and the ability of problem solving), and a low-level (individual behavior), with some skills for evaluating, comparing, and emulating (Kennedy & Eberhart, 2001).
The tendency to evaluate is, maybe, the behavioral feature more present in several living beings. This is intrinsic for the learning process. The comparison among individuals from a population is a behavior process used to identify patterns to find a way to improve each one quality. The emulation behavior gives the living beings the ability to learn through the observations of others actions. This behavior is not so much common in the nature. The emulation behavior is closer to the human society, but it can be used to improve the performance of the algorithms. These three concepts can be combined, even in simplified computational rules, providing a scheme for solving optimization problems.
PSO searches for the optimum in an infinite ( ∞ ) or  N (N-dimensional space of real numbers,). Indeed, computational solutions can only obtain a projection of an infinite space into  N space. The algorithm allows the particles to fly in the search space, looking for the optimization of a predetermined function. A special position x * of a particle in the search space represents the solution of the problem. This position can be represented by an vector of real numbers x=(x 1 , x 2 , …, x N ). In the PSO, the solution x * is searched in a iterative procedure. At each time step, or iteration of the algorithm, the position of the each particle x k is updated, by the addition of a speed vector v k , in each dimension (space direction). The iteration could be given by where x k (t-1) is the previous position of the particle. The velocity that is applied to the particle is function of two basic behaviors: a cognitive component, and a social component. The velocity is up dated taken into account the behaviors mentioned, as following: vt w vt c r a n dp xc r a n d p x      where c 1,2 are constants (real numbers) that weights the cognitive and social components, respectively, rand 1,2 are random numbers from a uniform distribution in the interval [0,1], p k is the best position achieved by the particle and p g is the best position of the swarm, with x k actual position of the particle k. The parameter w is used to give more stability to the algorithm (Shi & Eberhart, 1998). The selection of the w value follows a heuristic rule:  If w < 0.8: the algorithm works in a local search mode, in a exploration mode;  If w > 1.2: the algorithm works in a exploitation mode, where the global is more explored;  If 0.8 < w < 1.2: a balance point, where exploration and exploitation are well divided.

Artificial neural networks
The artificial neuron is a simplified representation of a biological neuron, where a weighted combination of the inputs is the value for the non-linear activation function (in general, a sigmoid one). In artificial neural networks (ANN), the identification of the connection weights is called the learning (or training) phase.
Several architectures can be used for the ANN. Related to the learning process, ANNs can be described into two important groups: supervised, and unsupervised NNs. For the supervised scheme, the connection weights are selected to become the output from the ANN as close as possible to the target. For example: the weights could be found by minimizing a functional of the square differences between the ANN output and the target values. Some authors have suggested the use of a Tikhonov's functional (Poggio & Girosi, 1990). The regularization suggested by Poggio & Girosi (1990) is applied on the output values x of the NN: where x is the NN output, w is the connection weight matrix, and P is a linear operator. Shiguemori et al. (2004) employed three different architectures for the NNs, with focus on determination of the initial profile for the heat conduction problem. The NNs used were: multilayer perceptron (MLP), RBF-NN, and cascade correlation. The backpropagation algorithm was used for the learning process (without regularization). Figure 2 shows a cartoon to represent a MLP-NN. Cybenko (1989) has proved that the standard multilayer NN with a single hidden layer with a sigmoid activation function, for a finite number of hidden neurons, are universal approximators on a compact subset of  N (Cybenko theorem).
The backpropagation training is a supervised learning algorithm that requires both input and output (desired) data. Such pairs permit the calculation of the error of the network as the difference between the calculated output and the desired vector:  1/2  x targetx(w). The www.intechopen.com weight adjustments are conducted by backpropagating such error to the network, governed by a change rule. The weights are changed by an amount proportional to the error at that unit, times the output of the unit feeding into the weight. Equation below shows the general weight correction according to the so-called delta rule (Haykin, 1999) [ where,  j is the local gradient, x i is the input signal of neuron j,  is the learning rate parameter that controls the strength of change, and  the momentum coefficient, which decides to what degree this previous adjustment is to be considered so as to prevent any sudden changes in the direction in which corrections are made. The learning rate  and momentum  were set to 0.1 and 0.5, respectively. Nevertheless, the activation test is an important procedure, indicating the performance of a NN. The effective test is defined using an unknown vector x that did not belong to the training set. This action is called the generalization of the NN. The up date for the weight matrix of neural connections is obtained with the innovation described in Equation (13): where the iterative process follows up to reach a maximum number of epochs (iterations) or the required precision to be caught. The delta rule can be justified if the diagonal matrix I (the learning parameter multiply by identity matrix) could be understood as an approximate inverse of Hessian matrix (a matrix C is defined as an approximate inverse if: ||I -CA|| < 1, see Conte and de Boor (1980)). Therefore, the free parameter  should determined to be an approximation for the inverse of the Hessian matrix, producing similar results as the Newton method. However, the Newton method, or even the steepest descent scheme, is not good to deal if a shallow local minimum. The momentum term ( 0, in Eq. (13)) is added to circumvent some instabilities (Rumelhartet al., 1986) and avoid pathological behaviour with shallow error surface (Haykin, 1999). However, any other optimization method can be used instead of delta rule. Indeed, the PSO scheme described in Section 2.1.2.2 can be employed to determine the weights of the neural network (Li and Chen, 2006).

The lagrangian stochastic model LAMBDA
The Lagrangian particle model LAMBDA was developed to study the transport process and pollutants diffusion, starting from the Brownian random walk modeling (Anfossi & Ferrero, 1997). In the LAMBDA code, full-uncoupled particle movements are assumed. Therefore, each particle trajectory can be described by the generalized three dimensional form of the Langevin equation for velocity (Thomson, 1987): , and x is the displacement vector, U is the mean wind velocity vector, u is the Lagrangian velocity vector, (, ,) (Degrazia et al., 2000).
The drift coefficient, (, ,) i at xu   , for forward and backward integration is given by for forward integration and 1 v c  for backward integration,

 
,, is the non-conditional PDF of the Eulerian velocity fluctuations, and ,, , (1 / 2) i j ik j k Bb b  . Of course, for backward integration, the time considered is '  tt , and velocity '  UU , being U the mean wind speed. The horizontal PDFs are considered Gaussians, and for the vertical coordinate the truncated Gram-Charlier type-C of third order is employed (Anfossi & Ferrero, 1997). The diffusion coefficients, ( , , ) ij bt xu , for both forward and backward integration is given by where ij  is the Kronecker delta, 2 i  and Li  are velocity variance at each component and the Lagrangian time scale (Degrazia et al., 2000), respectively. With the coordinates and the mass of each particle, the concentration is computed (see below).
The inverse problem here is to identify the source term S(t). A source-receptor approach can be employed for reducing the computer time, instead of running the direct model, Eq. (14), for each iteration. This approach displays an explicit relation between the pollutant concentration of the i-th receptor related the j-th sources: where the matrix ij M is the transition matrix, and each matrix entry given by -forw ard m odel; -backward model.

Determining gas flux between the ground and the atmosphere
For testing the procedure to estimate the emission rate procedure, it is considered the area pollutant sources placed in a box volume, where the horizontal domain and vertical height are given by: (1500 m  1000 m)  1000 m. There are two embedded regions R 1 and R 2 into computational domain, with following horizontal domain (600 m  600 m) for each region, and 1 m of height, and they are realising contaminants with two different emission rates. Figure 1 shows the computational scenario in a two-dimensional projection ( , xy): the six sensors are placed at 10 m height and spread horizontally with the coordinates presented in Table 1

SSSSSSSSSSSSS 
The dispersion problem is modelled by the LAMBDA backward-time code, where data from the Copenhagen experiment are used, for the period (12:33 h -12:53 h, 19/October/1978) (Gryning & Lyck, 1984. It is assumed a logarithmic vertical profile for the wind field being U(z) the main stream, u * the fiction velocity,  the von Kármán constant, z the height above the ground, and z 0 =0.06 m the roughness. The wind speed was measured at 10 m, 120 m, and 200 m. The numerical value for u * was obtained from the best fitting with the measured wind speed -see Table 2 -and the equation (19). 1  400  500  2  600  300  3  800  700  4  1000  500  5  1200  300  6 1400 700 Table 1. Position of sensors in physical domain.
For this experiment the wind direction had an angle =180°, and the boundary layer height is h=1120 m (Gryning & Lyck, 1998). The turbulence parameterization follows Degrazia et al. (2000), for computing the wind variance ( 2 i  ) and the Lagrangian decorrelation time scale ( i L T ). These two turbulent parameters are considered constant for the whole boundary layer, and their numerical values were calculate at z=10 m level. This characterises a stationary and vertically homogeneous atmosphere.
The sensors dimension, where the fictitious particles have arrived, was de 0,1 m  0,1 m  0,1 m, centred in the computational cells presented in Table 1. Next, the reverse trajectories are calculated for 1000 particles per sensor. The parameters for numerical simulations were 1800 time-steps with t  = 1 s, meaning 30 min for the whole simulation. After 10 min, the concentration was computed at each 2 min, for the remaining 20 min of simulation. The mean concentration is found using the following expression:  (Gryning & Lyck, 1998).
The inverse approach is tested using synthetic measured concentration data The synthetic observations are emulated as: where C Mod (x,y) is the gas concentration computed from the forward model (14),  is a random number associated to the Gaussian distribution with zero mean and unitary standard deviation, and  is the level of the noise (Gaussian white noise). In our tests, =0.05 (Experiment-1) and =0.1 (experiment-2) were used. In other words, two numerical experiments are performed, using two different maximum noise levels: 5% and 10%.
The optimization method is used to find the minimum for the functional:  Figure 3. However, some sub-domains are considered releasing nothing. Therefore, the number of the unknown parameters to be estimated by inverse method decreases to 12 cells: 2347891 21 31 41 71 81 9 [,,,,,,,,,,,] AAAAAAA . Another a priori information is smoothness of the inverse solution (regularization operator). As mentioned before, the second order maximum entropy is applied, Eq. (13c). The use of entropic regularization for this type of inverse problem was independently proposed by Roberti (2005) and Bocquet (2005aBocquet ( , 2005b) -see also Roberti et al. (2007) and Davoine & Bocquet (2007).
There are several methods to compute the regularization parameter. Here, a numerical experimentation was employed to determine this parameter. Roberti (2005) had found the values  = 10 -6 for Experiment-1, and  = 10 -5 for Experiment-2.
The first inverse solution was obtained by applying the quasi-Newton optimization method (described in Section 2.1.1). In the Experiment-1, a global error around 2% is found for the inverse solution. For the region-1 (lower emission rate), the error in the estimation is 10%, and for region-2 the error is 3%. The biggest errors for estimation are verified for subdomains A 9 and A 14 , with error of 55% and 42%, respectively. In order to try some improvement for the inverse solution, the minimum value for the functional (22) is computed using the PSO scheme. This method is a stochastic technique and it deals with a population (candidate inverse solutions). Therefore, the numerical results represent the average from 25 seeds used by the PSO algorithm, working with 12 particles, with the following parameters: w = 0.2, c 1 = 0.1, and c 2 = 0.2. The total error obtained with the PSO was less than inverse solution computed from the quasi-Newton method. It is also important to point out that for the PSO approach no regularization operator was used ( = 0 in Eq. (22)).
The inverse problem was also addressed by neural network. The MLP-NN was designed and trained to find the gas flux between the soil and the atmosphere. However, there are two previous steps before the application of MLP-NN. Firstly, it is necessary to determine the appropriated activation function, number of hidden layers for the NN, and the number of neurons for each layer. Secondly, the values of the connection weights (learning process) must be identified.
The hyperbolic tangent was used as activation function. Different topologies for MLP-NN were tested with 6 inputs (measured points for 6 different sensors) and 12 outputs (emission for each subdomain). Good results were achieved for two neural networks with 2 hidden layers: NN-1 with 6 and 12 (6:6:12:12), and NN-2 with 15 and 30 (6:15:30:12) neurons for the hidden layers, respectively.

Conclusions
The estimation of gas flux between the surface and the atmosphere is a key issue for several objectives: air pollution quality, climate change monitoring, short and medium range weather forecasting, and ecology. This lecture presents two formulations to address this important inverse problem: optimization problem approach, and artificial neural networks.
When the inverse problem is formulated as an optimization problem, a regularization operator is, in general, required. The optimal solution could be computed using a determinisc or a stochastic searching methods. Deterministic methods has a faster convergence, but stochastic schemes can overcome the local minima. The regularization strategy requires the estimation of the regularization parameter, responsible by a fine balance between adhesion term (square difference between measurements and model data) and of the regularization function. For the present estimation, the PSO inverse strategy does not use the regularization. This is an advantage, but there are much more calculations of the objective function than the determinisc method.
The artificial neural networks produced the better results for the worked example. More investigations are needed to confirm this good performance. We are generating different weather conditions, for all seasons of the year and for different latitudes, to provide more robust conclusions. Any way, the results are very encouraging. As a final remark, two topologies for MLP neural network were identified. Of course, the best result is an obvious criterion to select the topology, but an additional criterion could be based on the complexity: less complex NN (less neurons) is better. Using these two criteria, the NN-1 (6:6:12:12) could be better than NN-2 (6:15:30:12).

www.intechopen.com
Air Pollution -Monitoring, Modelling and Health

278
The methodology for computing inverse solution was described for in situ measurements.
The new wave of environmental satellites already started (Clerbaux et al., 1999;Buchwitz et al., 2004;Crevoisier et al., 2004;Carvalho & Ramos, 2010). The next step is to use a cascade of inversions to estimate the surface flux. The first inversion is to determine the gas vertical profile concentration from the radiances measured by the satellite. The second inversion would be to identify the gas surface flux from the gas concentration found by the first inversion. We are going to adapt the methodology described here for this new observation data.