Summary of scientific notation and most used acronyms.

## 1. Introduction

In the last two decades, the interest for the estimation of Earth surface parameters from remotely sensed data has increased in the scientific community. Within this field, one of the most challenging and attractive problems is represented by the estimation of soil moisture (SM) and vegetation water content (VWC) as they are fundamental in many disciplines.

The prediction of SM variations is equally important at mesoscale and smaller scales. Mesoscale atmospheric models have demonstrated sensitivity to spatial gradients while at field level, SM can be considered storage of water between rainfalls and evaporation thus acting as a regulator to fundamental hydrologic processes such as infiltration and runoff (Delworth, 1988).

Surface SM information is also a critical forcing variable in many Soil Vegetation Atmosphere Transfer (SVAT) models which are able to estimate SM values at daily time steps.

Vegetation is a fundamental component of every ecosystems and VWC is one of the most important biochemical components with 35-95% of the vegetation body. VWC yields information about the physiological conditions of the plants. Furthermore, estimation of VWC from local to global scales is central to the understanding of biomass burning processes, water stress and drought condition. The prediction of this variable can be important for irrigation strategies and for yield forecasting (Pennuelas et al., 1993).

Spaceborne and airborne microwave sensors are best suited for the detection of water content (Ulaby et al., 1986). The retrieval of biophysical parameters from remotely sensed data falls within the category of inverse problems where, from a vector of measured values, m, one wishes to infer the set of ground parameters, x, that gave rise to them. The inverse problem is typically ill-posed due to its non-linearity between remote sensing measurements and ground parameters. Furthermore, many aspects of the natural surfaces, such as surface roughness and the amount and type of vegetation, alter the radar backscatter.

Many approaches have been developed in order to provide possible solution to these inverse problems, spanning from empirical and semi-empirical approaches to sophisticated machine learning techniques.

The development of empirical models has been studied both as a first approach to study the relationship between remotely sensed signals and surface parameters and to obtain a simple inversion model in itself. The frequently used linear approach is based on regression coefficients generated by the observations over a specific site (Prevot et al, 1993, Dubois et al 1994). One of the first empirical models was proposed by Oh et al., 1992 on bare soils, where the co-polarized and cross-polarized ratios of the backscattering coefficients are expressed in terms of the surface parameters. The Oh model, which is developed from multi-polarization radar data, was revealed to be poorly effective when tested on synthetic aperture radar (SAR) data. Subsequently, Dubois et al. 1995 developed an empirical inversion model from scatterometer data and applied it to SAR data in the case of bare soils. The Dubois inversion model was found to be applicable to the different forms of measured data and tends to be quite accurate with a root-mean-square error (rmse) of 4.2% on SM values. Although the Dubois model performed well, it is site specific and is only valid under the conditions in which the measurements were taken. As a result of the way empirical models are developed and their relative inversion procedure, they have a limited range of applicability. The complexity and nonlinearity of the problems cannot be taken into account in empirical formulations, thus leading to the necessity of considering theoretical backscattering models.

Many theoretical models have been developed in order to describe the interaction between the electromagnetic radiation and natural surfaces. They can represent a great variety of situations and still have the possibility to consider cases that have not been taken into account by the empirical models. On the other side, theoretical models are developed under several hypotheses that may not be completely verified in field conditions. One main limitation of a theoretical model is considered the description of the surface morphology. One of the most widely used descriptions is based on two parameters: 1) the standard deviation (SD) of heights *s* and 2) the correlation length *l*. The SD of heights is an estimate of the variance of the vertical dimension of the soil surface profile, whereas its correlation function relates the statistical correlation between any two points on a given surface. The surface correlation length *l* is usually defined as the displacement for which the correlation function is equal to 1*/e* (Ulaby et al, 1986). This parameterization is often considered critical because they do not completely describe the variability of natural surfaces (Mattia & Le Toan, 1999). The SD of heights can have an accuracy of only about 10%, the correlation length measurements vary as much as an order of magnitude (Dubois et al, 1995, Notarnicola et al, 2003). Although they have the capacity to generalize and treat a great variety of situations, theoretical forward-scattering algorithms are of a certain complexity and are sometimes difficult to invert due to the requirements of several parameters in the computations.

To overcome this difficulty, typical inversion techniques are iterative methods and statistical approaches. Bindlish and Barros (Bindlish &Barros, 2000) used the integral equation model (IEM) with the Jacobian method—an iterative scheme—to perform the inversion on multifrequency multipolarization SAR data from Washita ’94. In this case, the retrieval can be performed on all the surface parameters, as they are included in the IEM. This algorithm, which is tested only with one data set in a single sensor configuration, produces SM estimates with an average error of 3.4%. Statistically based inversion methods, such as the Bayesian approach, have been in existence for a long time and are based on probabilities that a given set of measurements comes from certain surface parameter values. The probability density functions (pdfs) are estimated by training, where samples of sensor and surface measurements are presented in the algorithm. The practical use of Bayes’ theorem is to turn probabilities that can be estimated from a training set into those that are required for the estimation of the unknown surface parameters (Marchant & Onyongo, 2003). A useful property of a Bayesian method is that it is optimal in the sense that it minimizes the expected error. Another important aspect is that, to derive these general pdfs, as performed with the Bayesian methodology, a large amount of experimental data is needed. The experimental data should cover a wide spectrum of real situations to obtain reliable statistical functions, but the inversion technique itself does not represent “the solution.” In fact, the inversion procedure has the same limitation as the forward model as it relies on limited surface parameter conditions. As an example, Haddad and Dubois (Haddad & Dubois, 1994), starting from the forward model proposed by Oh et al. (Oh et al, 1992) used a Bayesian approach to determine the inverse model. As the model was based on a data set with a low correlation length, it failed to be applicable to the data sets without this condition.

A suitable method for this kind of multidimensional retrieval is the neural network (NN). It can be trained to extract surface parameters from remotely sensed data, and in this way, it can perform the same function as a statistical inversion method. The training data for the NNs can be obtained from theoretical forward-scattering models, thus allowing the control of the range of parameters with which the network is trained. Artificial NNs (ANNs) have a number of advantages and disadvantages compared to conventional statistical algorithms. One advantage of an NN is that it can identify subtle and nonlinear patterns, which is not always the case with traditional statistical methods (Beale & Jackson, 1992). In addition, NNs do not require normally distributed continuous data and may be used to integrate data from different sources with poorly defined or unknown distributions. Another advantage is that NNs are able to take a specific set of input data and generalize a solution set, which may give the correct answer for unknown input patterns that are similar, but not identical, to the input data. One of the problems is the difficulty in adequately configuring and training a network. There are no given rules for the configuration of the network (in terms of the number of hidden nodes, hidden layers, etc.). The training process has to be carefully controlled due to the risk of overtraining the network. Overtraining is a phenomenon whereby the network learns a training data set to an excellent level but cannot accurately predict the correct answer with independent test data. Furthermore, overtraining frequently happens when the number of training data is limited as often are the remotely sensed data sets (Notarnicola et al, 2008)

The main drawback of an NN is that the inverse empirical mapping established between remotely sensed data and surface parameters cannot be explicitly written down, and the user can generally only act on some configuration parameters but not on the analytical expression that leads to the results.

New approaches are emerging in the last years for the estimation of biophysical parameters; one of the most used is the Support Vector Regression (SVR).

SVR, initially developed for classification purposes, is now being applied also to the estimation of biophysical parameters. SVR is based on a geometrical rather than a statistical approach, because it bases the estimation on both the geometrical distances between samples and the maximization of the geometrical margin instead of on the estimation of the posterior probability distribution over the samples. For this reason, there are two main advantages with respect to NN and statistical approach. The SVR method is less sensitive to the limited availability of training samples with respect to other machine learning techniques and to the overfitting of the datasets, thus leading to high generalization capabilities (Camps-Valls et al., 2006). Till now this approach has not yet applied for the SM estimation.

Another way to overcome the difficulties of the single approach is to use the concept of ensemble. Ensembles are widely used in machine learning techniques and the main idea of ensemble learning is to employ multiple learners and combine their predictions.

The last decade has seen many works related to ensemble learning systems. These systems are groups of machine learning approaches where each learner provides an estimate of a target variables that after are combined in different ways in order to reduce the generalization error if compared to the single learner (Brown et al., 2005).

The different estimates are usually combined through a combination function, commonly a majority vote for classification and a linear combination for regression. It is a good improvement in the combined estimates if the individual estimators should exhibit different patterns of generalization

As an example some works on ensemble of neural networks are reported. Cho and Kim (1995) combined the results from multiple neural networks using fuzzy logic which resulted in more accurate classification. Bishop (1995) affirms that if L networks produce errors which have zero mean and are uncorrelated, then the sum-of-squares error can be reduced by a factor of L simply by averaging the predictions of the L networks. Liu and Yao (1999) proposed the Negative Correlation Learning (NCL) algorithm wherein a penalty term is added to the error function which helps in making the individual predictors as different from each other as possible while encouraging the accuracy of individual predictors. This enables the mapping function learnt by the ensemble to generalize better when an unseen input is to be processed.

In this context, this chapter will address assessed remote sensing procedures, such as empirical models, Bayesian methods for the estimation of SM and VWC from multi-frequency and multi-polarization SAR images in synergy with optical sensors and electromagnetic models. Initially, the procedures are used as separate inversion methods. In this case, limitations and potentialities are illustrated. Subsequently, each method is considered as an element of an ensemble from which then the best estimates are drawn.

The basic concept behind this ensemble method is that each single methodology has its advantage and disadvantage and it is able to detect some features with high accuracy and other features with low accuracy. The idea of ensemble learning is to employ multiple learners and combine their predictions. Numerous works applied in different context have demonstrated that the ensemble estimate accuracy is quite often much higher than the accuracy of the single predictor (Ueda & Nakano, 1996).

This work presents an innovative approach for the ensemble of regression algorithms by considering both different regression techniques applied to different sensor configurations thus exploiting the capability of different frequencies/polarization combination to estimate soil and vegetation features.

The chapter is organized as follows. Section 2 is devoted to the description of analyzed experimental data sets. Section 3 illustrates the most used electromagnetic models whose simulations will be used in the inversion procedure. These procedures are outlined in section 4. The results of the different procedures are discussed in section 5. Section 6 introduces the concept of *ensemble estimates* and discusses the results of this technique with respect to the results obtained from the different procedure. Conclusions and future applications are drawn in section 7.

SM | Soil moisture |

GSM | Gravimetric soil moisture |

VWC | Vegetation water content |

VSM | Volumetric soil moisture |

σ^{0} | Backscattering coefficient |

τ^{2} | Two-way attenuation of the vegetation layer |

ε | Dielectric constant |

SD | Standard deviation |

s | Standard deviation of height |

l | Correlation length |

IEM | Integral Equation Model |

WCM | Water Cloud Model |

## 2. Data set description

SMEX’02 is a remote sensing experiment that was carried out in Iowa in 2002 (http://nsidc.org/data/amsr_validation/soil_moisture/smex02/), mainly focused on modelling and algorithm validation over a range of SM conditions with moderate to high vegetation biomass conditions. The main site, chosen for intensive sampling SM, vegetation and surface roughness, was the Walnut Creek watershed (Figure 1), where 32 fields, 10 soybean and 21 corn fields, were sampled intensively. The field and sensor data acquired during this experiment are particularly suitable to our analysis because of:

The number of fields that were considered in the experiment with different level of soil and vegetation moisture;

The acquisition of both radar and optical data and the extensive ground measurements carried out within each field.

### 2.1. Soil moisture measurements

SM sampling in the Watershed sites was carried out to provide a reliable estimate of the mean and variance of the volumetric SM of the surface SM for fields that are approximately 800 m by 800 m. These measurements are used primarily to support the aircraft based microwave investigations, which were conducted between 0900 and 1200 local time. At four standard locations in each site the gravimetric soil moisture (GSM) was sampled on each day of sampling with a 0-6 cm scoop tool. This GSM sample was then split into 0-1 cm and 1-6 cm samples providing a rough estimate of the site average 0-1 cm GSM. GSM is converted to volumetric soil moisture (VSM) by multiplying GSM and bulk density of the soil. Bulk density was sampled one time at each of these four locations using an extraction technique. VSM values are calculated by using GSM and bulk density that are the parameters directly measured in the fields. The soil texture data for the SMEX’02 study area were obtained from CONUS-SOIL dataset (Miller & White, 1998). Soil texture is of the utmost importance in physical models for estimation of soil dielectric properties. In fact the Hallikainen empirical formula derives the soil dielectric constant from SM and soil texture values (Hallikainen, 1995). The values of the real part of the dielectric constant along with the roughness parameters are the inputs to the theoretical models used in this inversion approach. This part is described in the following sections.

### 2.2. Vegetation water content measurements

VWC (kg/m2) was measured several times in 32 fields with four rounds. VWC in plant stems and leaves were computed as

where B_{g}’ is the green biomass + tare weight and B_{d}’ is the dry biomass + tare weight. This assumes that water loss from the tares (paper bags) was negligible in comparison with that from the plant samples. In row crops, areal stand density (ASD; plants/m^{2}) was estimated from the row plant density (RD; plants/m) by using

where RS is the row spacing. VWC (kg/m^{2}) was then computed as

However not every field was sampled during each round. This implies that a measured VWC value is not available for all days of acquisitions. For each field-date combination, three locations in the field were visually selected from airborne digital imagery to represent average, minimum and maximum canopy conditions. Above ground biomass was removed and wet and dry weights were used to compute VWC. For this investigation, all samples within a field on a given date were averaged and this single value was used.

Other ground truth measurements used in this work include surface roughness in terms of standard deviation of heights and correlation length.

### 2.3. Remotely sensed data

The AirSAR images (resolution: 8 -12 m ground range) were acquired on 1, 5, 7, 8, 9 July 2002. The LANDSAT (resolution: 30 m) images were acquired contemporary to SAR on 1, 8 July 2002.

The five L- and C-band images were processed by the AirSAR operational processor providing calibrated data sets. The absolute and relative calibration accuracy obtained for each sensor, as reported in the literature (van Zyl, 1992), are listed in table 2.

Absolute/relative | c-BAND | l-BAND |

AIRSAR | ±1.0 dB / ± 0.4 dB | ±1.2 dB / ± 0.5 dB |

From sensitivity studies (Dubois et al., 1995), in order to avoid errors in the SM estimation larger than 4.2%, the relative calibration error should be less than 0.5 dB and the absolute calibration error should be less than 2.0 dB, because the inversion is also more sensitive to relative than absolute calibration errors.

During the campaign, two Landsat Thematic Mapper (TM) scenes from Landsat 5 and three Landsat Enhanced Thematic Mapper plus (ETM+) from Landsat 7 were acquired during the primary study period. They were mainly used to calculate the brightness temperature and the indices, the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI). These two indices are also very important factors in estimating VWC which is needed for SM estimation using microwave methods. The images were atmospherically and radiometrically corrected to produce the at-ground reflectance and then the NDVI and NDWI indices (Gao et al., 1996).

In this work, the data acquired on 1^{st} July and some data taken randomly from the other dates were considered as training samples. The fact to not consider exclusively the data coming from one single day allows the results to be independent from the specific soil and weather conditions of a single date.

## 3. Electromagnetic models

As the proposed approaches, both the empirical and the statistical methods, consider in different ways simulated data, theoretical models for bare and vegetated soils are briefly described. For bare soil, the SAR response has been simulated by means of the Integral Equation Model (IEM), (Fung, 1994). This model, with respect to other electromagnetic models, has the advantage of being applicable to a wide range of roughness scale. For the model, the input parameters are the real part of the dielectric constant, the standard deviation of height and the correlation length. The dielectric constant is linked directly to VSM and soil texture through some well known and validated experimental relationships (Hallikainen, 1985).

In the IEM formulation, the like polarized backscattering coefficients for surfaces with small or medium roughness are given by:

where *k* is the wave number,* θ* is the incidence angle, k_{ z }= kcosθ, k_{ x }= ksenθ and pp refers to the horizontal (HH) or vertical (VV) polarization state and s is the standard deviation of terrain heights. The term_{ H }, R_{ V }, the Fresnel reflection coefficients in horizontal and vertical polarizations. The Fresnel coefficients depend directly on the dielectric constant. The symbol W (-2k_{ x },0) is the Fourier transform of the n^{th} power of the surface correlation coefficient. In this context, an exponential correlation function has been adopted that seems to better describe the properties of natural surfaces (Fung, 1994).

For vegetated soils, the simple approach, based on the so-called water-cloud model (WCM), was developed by Attema and Ulaby (1978), who proposed to represents, in a radiative transfer model, the vegetation canopy as a uniform cloud whose spherical droplets are held in place structurally by dry matter. The WCM represents the power backscattered by the whole canopy σ^{0} as the incoherent sum of the contribution of the vegetation σ^{0} _{veg} and the contribution of the underlying soil σ^{0} _{soil}, which is attenuated by the vegetation layer through the vegetation trasmissivity τ^{2}. For a given incidence angle the backscatter coefficient is represented by the general form:

Particularly, this expression can be written in more detailed way:

where VWC is the vegetation water content (kg/m^{2}), θ the incidence angle, σ^{0} _{soil} represents the backscattering coefficient of bare soil that in this case calculated by using the IEM model, τ^{2} is the two-way vegetation trasmissivity with τ^{2} = exp(-2B VWC/ cosθ). The parameters A and B depend on the canopy type and require an initial calibration phase where they have to be found in dependence of the canopy type.

In this work the model simulation enters differently in the inversion procedure. For the Bayesian approach, the simulated data are generated in order to compare them to the measured data and to create the noise probability density function (pdf) as detailed in the section devoted to this approach. The formulation of the WCM has been used in the derivation of the empirical models for the consideration of all the scattering components that have to be taken into account in the interaction between vegetation-soil and the SAR signal.

## 4. Description of inversion methodologies

### 4.1. Empirical methods

The empirical approach has been developed in two separate versions, one for the VWC estimates and the other one for the SM estimates.

For VWC, the linear relationship has been modeled as follows:

where σ^{0} _{1}, σ^{0} _{2} are the backscattering coefficients with the following configurations:

σ

^{0}_{1}and σ^{0}_{2}are respectively σ^{0}_{HH}, σ^{0}_{VV}for C bandσ

^{0}_{1}and σ^{0}_{2}are respectively σ^{0}_{HH}, σ^{0}_{VV}for L bandσ

^{0}_{1}and σ^{0}_{2}are respectively σ^{0}_{HH}for C band and σ^{0}_{HH}for L band

Within 32 fields, some of them have been chosen randomly and considered as training fields. The training data belong to the acquisitions carried out on 1^{st} July, thus assuring that the comparison with the Bayesian approach results is performed under identical training conditions. For the test, the data acquired on 8^{th} July were used. The choice for the training and test data were dictated from the availability of Landsat image in contemporary acquisitions with SAR data. The correlation coefficients and the F-values for the empirical correlations in the training data are listed in table 3.

Empirical model | R2 | F | P |

σ0HH, σ0VV for C band | 0.68 | 26.2 | < 0.05* |

σ0HH, σ0VV for L band | 0.64 | 21.8 | < 0.05* |

σ0HH C band/ σ0HH L band | 0.68 | 26.4 | < 0.05* |

For SM, a different kind of empirical relationship has been supposed because a simple linear relationship similar to the one for VWC did not produce acceptable results. An approach was proposed by Notarnicola et al. 2006, following an approach developed by Chen et al. 2003 and based on a previous work by Dubois et al. 1995. This empirical approach was derived and tested on a subset of the SMEX’02 data, producing acceptable results. However, when applied to the whole data, the results were not satisfactory. Then, in order to take into account the different components in the interaction of the SAR signal with the soil and vegetation, the empirical model has been inspired to the vegetation theoretical model described in section 3.

The SM has been supposed to be a function of backscattering coefficients, of VWC, of the roughness parameter s and of a combination of the roughness parameter multiplied by an attenuation factor expressed as exp(-VWC):

This relationship should take into account the following contributions due to the interaction among the signal, the canopy and the soil (Attema & Ulaby, 1978)

- the relationship to the backscattering coefficients is considered as a kind of mean values of the overall responses of soil, vegetation and their interaction;

- the relationship to VWC is fundamental as VWC plays a key role in these densely vegetated fields on the retrieval of SM as already demonstrated in Notarnicola et al. 2007. It quantifies the contribution of VWC to the detected signal.

- the contribute of the soil is divided in two terms, one is SM which in this case is the parameter to be estimated and the other is the roughness parameters s. As showed in previous studies (Notarnicola et al 2007, Du et al, 2008), this last parameter plays an important role also for densely vegetated fields.

- the relationship to s*exp(-VWC) takes into consideration double bouncing effect which may appear especially for tall plants such as corn plants in case of shorter wavelength (C band). The contribution of the soil is represented by the s parameter multiplied by exp(-VWC) which represent the attenuation of the signal from soil due to the presence of vegetation.

The correlation coefficients and the F-values for the empirical correlations in the training data are listed in table 4.

Empirical model | R2 | F | P |

σ0HH, σ0VV for C band | 0.32 | 2.30 | "/ 0.05*** |

σ0HH, σ0VV for L band | 0.42 | 3.41 | < 0.10** |

σ0HH C band/ σ0HH L band | 0.48 | 4.20 | < 0.05* |

The data in table 3 and 4 illustrate the difficulty to infer information about SM especially in the case of the C band. If the data are further divided in two groups, soybean and corn fields, the correlation improves notably for corn (R^{2}=0.63) but the correlation is not considered reliable for the F test. For the soybean fields, the correlation does not change considerably with respect to the values shown in table 4.

The training data were used to evaluate the multiple regressions. The obtained relationships are then applied to the test data in order to verify their generalization capabilities and robustness. This analysis is illustrated in the section dedicated to the results comparison.

### 4.2. Bayesian methodology

The main aim is to infer the soil parameter values, S_{i}, that for vegetated soils are the soil dielectric constant ε, the standard deviations of heights, s, and the correlation length, l, and the vegetation water content VWC by measuring features f_{1}, f_{2}, …, in this case represented by backscattering coefficients, σ_{1m}, σ_{2m},....,. The procedure is divided into training and test phase.

In the training phase, the conditional probability P(σ_{1m}, σ_{2m},, …, | S_{i}) can be estimated by using the Bayes’ theorem from a part of the data. This is the probability of finding that particular vector of features σ_{i}, given specific values of S_{i}.

By using IEM, theoretical values of the sensors responses, in correspondence to ground truth, are obtained. The latter are compared to the experimental values introducing random variables, N_{i}, not depending on ε, s and l and representing a function that takes into account some noise factors such as the sensor noise, the error introduced by IEM and the contribute of vegetation (Notarnicola et al., 2006). The problem consists in finding an estimate of the P(σ_{1m}, σ_{2m},, …, | S_{i}) by taking into account the presence of this noise factor N_{i} and setting the relationship between measured and simulated data as follows:

where σ_{im} and σ_{ith} are respectively the measured and theoretical values of sensor responses. Once calculated the function P(σ_{1m}, σ_{2m}, …, | S_{i}), the Bayes’ theorem allows for the calculation of the posterior probability from the above conditional probability and the prior probability:

In the case of bare fields, the theoretical values calculated by the IEM model should be as close as possible to the measured ones and then the pdf mean should be close to the value of 1 with a standard deviation that represents the field variability as well as the sensor error. For vegetated areas, the resulting pdf means should quantify the different behavior of radar signal for bare and vegetated fields. Thus pdfs should contain information on some vegetation parameters that influence the radar signal. Particularly, a good correlation has been found between pdf means and VWC. Instead of correlating pdf means directly to measured VWC, the estimates of this parameter, obtained from a LANDSAT image, have been considered. The purpose is to verify whether the pdf mean variations can be predicted using VWC derived from other remotely sensed data. The methodology for the calculation of VWC from LANDSAT images has been derived and tested in Jackson et al. 2004. The pdf means have been correlated to these LANDSAT derived VWC. A linear relationship has been presumed among pdf means and VWC, initially in the following form:

The general trend indicates that pdf means decrease as VWC increases. However the trend is not constant, a group of data belonging to corn fields has a particular behavior and also if the VWC is relatively high (around 4 kg/m^{2}) the corresponding pdf means is high as well. This is in contrast to what established before. This group is made up of pdf values that indicate a relative small difference between the measured and the theoretical backscattering coefficients. This may be ascribed to the presence of a rough surface whose contribute to theoretical backscattering coefficients is higher with respect to a smooth surface (Ulaby et al.,1986).

Within each vegetated group, soybean fields are characterized by low values of s, around 0.6 cm, which determine low values of theoretical backscattering coefficients. Then the ratio between measured and theoretical values is below 1 even if the vegetation is not very dense. On the other side, the roughness in the corn fields is characterized by higher values of s. The rougher surface contributes with high theoretical backscattering coefficients and determines values of the ratio not as low as expected in the case of this dense corn vegetation.

The correlation between pdf means, VWC and s has been also considered in the inversion procedures as a multiple fit:

Table 5 reports the correlation coefficients (R^{2}) for the considered remotely sensed data configuration for the linear relationships (11) and (12) between pdf means and VWC values and the linear relationships (13) and (14) among pdf means, VWC values and the roughness parameter s.

Polarization/frequency | Only VWC R2 | VWC + roughness R2 |

CHH+CVV | 0.23 | 0.50 |

LHH+LVV | 0.61 | 0.85 |

CHH+LHH | 0.37 | 0.52 |

The aim of the training phase is to evaluate the pdf P(S_{i} |σ_{1m}, σ_{2m},, …,) while in the test phase the expression (10) is applied on the second half of the acquired data in order to verify the prediction capability of this methodology.

The dependence of the pdf means on the amount of VWC introduces a new variable in the inversion problem (Notarnicola et al, 2007) that can be used to extract VWC values themselves from the radar signal. With the introduction of the VWC as a new variable k, the posterior pdf expressed in (10) can be written as follows:

As the main interest was to extract dielectric constant values from which SM can be calculate (Hallikainen et al., 1985), a first integration over the pdf P(ε, s, l,k|σ_{1m}, σ_{2m}, …) is performed with respect to the roughness parameters, s and l, and k over their range of values in order to obtain a marginal distribution:

This distribution represents the probability of the different dielectric constant values for the possible combination of measured backscattering coefficients σ_{1m}, σ_{2m},, …, (Notarnicola & Posa., 2004).

Analogous calculation can be performed for the variable k which represents VWC. The pdf P(ε, s, l,k|σ_{1m}, σ_{2m}, …) has to be integrated over the whole range of dielectric constant values and roughness parameters in order to obtain a pdf that retains exclusively information on the VWC:

From this distribution the mean value and the variance of the estimator can be extracted (Gelman, 1995) as follows:

In all these calculations, the prior pdf for the parameters, over which integration is performed, has to be specified. In the integration for the calculation of the marginal distribution, the prior pdf has been considered uniform across the whole possible range of values. This means that no supplementary information about these parameters was considered apart from their range of values. The dielectric constant has been integrated in the range from 2 to 20 and the VWC in the range 0.1 to 8 kg/m^{2}. The integration window for s is [0.1 cm, 3.0 cm] and for l is [0.1 cm, 21.0 cm], they cover most of the surface measurements. The purpose was to verify the capability to extract dielectric constant and VWC values independently from roughness levels. This procedure has been applied to backscattering coefficients σ_{1m}, σ_{2m,...} in the following configurations:

C band, HH and VV polarizations;

L band, HH and VV polarizations;

C and L band, HH polarization.

## 5. Results of the single methodologies and relative comparison

As illustrated in previous paragraphs, the inversion methodologies have been applied to different sensors configurations, trying to exploit if the combination of different polarizations and/or bands may help to extract the soil features. In fact, due to the different way C band or L band signals interact with soil and the above canopy layer, they are sensitive to different surface characteristics. Then their use is important to the concept of the ensemble that will be described in the next section.

In this paragraph, the results of the empirical and Bayesian methodologies are illustrated and evaluated in terms of:

Correlation coefficients, R

^{2}, between the estimates and the ground truth valuesRoot Mean Square Error, RMSE, between the estimates and the ground truth values.

This analysis is carried out on the test data. Tables 6 and 7 list the performance characteristics of the single procedure for each sensor configuration respectively for VWC and SM estimates.

The best performances are done by the C and combination of C and L band data for the Bayesian approach, while for the empirical approach only the L band retain the good performances obtained during the training phase.

Methods | R2 | RMSE (kg/m2) |

Empirical C band | 0.20 | 2.44 |

Empirical L band | 0.56 | 1.29 |

Empirical C – L band | 0.25 | 2.27 |

Bayesian C band | 0.64 | 1.30 |

Bayesian L band | 0.46 | 1.46 |

Bayesian C – L band | 0.55 | 1.29 |

Methods | R2 | RMSE (cm3/cm3) |

Empirical C band | 0.20 | 0.11 |

Empirical L band | 0.0006 | 0.09 |

Empirical C – L band | 0.05 | 0.12 |

Bayesian C band | 0.14 | 0.11 |

Bayesian L band | 0.17 | 0.08 |

Bayesian C – L band | 0.47 | 0.05 |

As expected the estimation of SM is quite difficult, thus determining values of R^{2} not higher than 0.47 and high RMSE up to 0.12 cm^{3}/cm^{3}. The performance of the empirical and Bayesian approach improves if the extreme values of SM are excluded from the error computation. In this case, the results are illustrated in table 8 where values of SM higher than 0.27 cm^{3}/cm^{3} and lower than 0.10 cm^{3}/cm^{3} have been excluded.

Methods | R2 | RMSE (cm3/cm3) |

Empirical C band | 0.11 | 0.06 |

Empirical L band | 0.40 | 0.05 |

Empirical C – L band | 0.42 | 0.08 |

Bayesian C band | 0.22 | 0.10 |

Bayesian L band | 0.45 | 0.05 |

Bayesian C – L band | 0.65 | 0.02 |

This indicates that both algorithms are not able to predict the extreme values of the SM range. For low values, it depends on the fact that the signal for soil is weak and difficult to be disentangled from the vegetation signal. For high values, the signal from soil is strong but in the case of C band the effect of absorption from ‘narrow leaf’ plants, such as soybean, determines a lower signal reaching the sensor (Macelloni et al., 2001). The L band estimates are the only one able to predict highest values of SM.

For the Bayesian methodology, similar analyses were also found in Notarnicola et al. 2006. In that case, the methodologies were applied only to few fields of the same data sets. With respect to the accuracy reported in Notarnicola et al., 2006, a worsening in the performance is found. In particular the data set includes all the fields in the WC basin and the fields located in the eastern part which exhibits anomalous values of SM, some very high values around 0.35 cm^{3}/cm^{3} and some values lower than 0.05 cm^{3}/cm^{3}.

If the watershed is divided in two parts, the western and the eastern part, the performances of the algorithm for SM retrieval differ significantly. The correlation coefficients R^{2} are equal to 0.33 and 0.70, not significantly different from those found in Notarnicola et al. 2006.

Furthermore, the performances notably change if in the data set the soybean and corn fields are considered separately. This happens only for the Bayesian approach while the results for the empirical approach remain the same. The results for the Bayesian approach are reported in table 8.

Similar characteristics are also found in (Lakhankar et al., 2009), where it is proved that the RMSE is dependent on the level of vegetation of the different fields. Furthermore, in the case of C band, the signal coming from the VWC dominates over the signal coming from soil. In fact, when the vegetation has low value of VWC, such as in the case of soybean fields, the C band is able to provide acceptable estimates for SM. In the case of corn fields, the best results are obtained with the combination of C and L band, one sensitive to VWC and the other to the surface contribution. In this case, the discrepancies may be ascribed to the fact that in the Bayesian formulation the double bouncing between soil and corn trunk effect is not taken into account. This effect in such kind of plants with broad leaves could dominate (Macelloni et al., 2001).

Methods | R2 | RMSE (cm3/cm3) |

Corn fields | ||

Bayesian C band | 0.13 | 0.13 |

Bayesian L band | 0.17 | 0.09 |

Bayesian C – L band | 0.47 | 0.06 |

Soybean fields | ||

Bayesian C band | 0.69 | 0.03 |

Bayesian L band | 0.18 | 0.07 |

Bayesian C – L band | 0.67 | 0.04 |

## 6. Ensemble estimates and relative results

The idea to use the ensemble concepts emerges from the previous analysis on the results of the single inversion techniques. Different wavelengths (C – L band) or their combination can be used to extract information according to different types of vegetation, different level of SM and VWC. This information stemming from the previous analysis can be inserted in an ensemble approach. The problem can be formalized in the following way.

The knowledge about the function f, which performs the inversion from the signal domain to the feature domain, is represented by a learning sample of n independent observations:

An algorithm a is used to fit a model a( |L) to the data L. Based on this model certain objects of interest a(S_{i} |L) which describe the distribution of S_{i} given σ_{i} can be computed. In this analysis, a(S_{i} |L) are the linear regression and the Bayesian approach and the objects of interest may be the predicted values

At least for regression (Si ∈ R) and binary classification problems (Si ∈ {−1, 1}), an ensemble a_{E} of K basis models can be written as a linear combination of K predictions derived from model a_{k}, which was fitted using a special learning sample L^{k}:

For real valued responses in regression problems, the prediction of the ensemble is a weighted sum of the predictions of the basis models. The next part of the section is dedicated in finding the best solution in order to create weighted estimates starting from the estimates of the single learners.

In this case, the single learners are represented by the empirical and Bayesian approaches applied to different sensor configurations. As indicated in the section dedicated to the results analysis, the information given by the different approaches and the different configurations are in many cases complementary with respect to the type of vegetation, and of SM values. Then they can be considered as the members of an ensemble and the main aim is to find the best combination of members which then will lead to find the best estimates for the inversion problem.

In this case, one of the main differences with respect to the traditional ensemble techniques is that the single learner is trained separately and then the estimates are considered as part of an ensemble.

The inversion approaches have been applied to the training data in order to calculate the RMSE considering the following configurations:

For SM, 5 different levels: 0.0-0.10 – 0.10-0.15 - 0.15-0.20 – 0.20-0.25- higher than 0.25 cm

^{3}/cm^{3};Within each of these groups this is the further distinction between corn and soybean.

For VWC, three main groups have been considered, 0.0- 1.0, 1.0-3.0, higher than 3.0 kg/m

^{2}.

For each of these groups, the RMSE errors have been calculated in order to verify for which of the six inversion procedures adopted it is possible to find the lowest value of RMSE. The output of this procedure is illustrated in the following tables:

Methods/ranges | 0.0-1.0 | 1.0-3.0 | "/ 3.0 |

Empirical C band | |||

Empirical L band | x | ||

Empirical C – L band | |||

Bayesian C band | x | x | |

Bayesian L band | |||

Bayesian C – L band |

Methods/ranges | 0.0-0.10 | 0.10-0.15 | 0.15-0.20 | 0.20-0.25 | "/ 0.25 |

Empirical C band- corn | x | ||||

Empirical C band- soybean | |||||

Empirical L band -corn | |||||

Empirical L band- soybean | x | x | |||

Empirical C – L band - corn | x | x | |||

Empirical C –L band- soybean | |||||

Bayesian C band - corn | |||||

Bayesian C band - soybean | x | ||||

Bayesian L band - corn | x | x | |||

Bayesian L band - soybean | x | x | |||

Bayesian C – L band- corn | |||||

Bayesian C – L band - soybean |

This analysis is the base for the application on the test data sets. The six approaches have been applied to the test data and the best estimates have been calculated by using the following three steps:

Step 1 Calculation of the estimates average, considering all the values if they fall in the same range and excluding one or two values if they disagree with the other ones. If there is a conflict between an empirical estimate and a Bayesian one, the last has been chosen as it is most reliable in many cases. This first step is useful to individuate the range of the estimates and then adopt the best estimator. For VWC the range of the parameters has been also compared with the estimates deriving from the Landsat images by using the approach of Jackson et al. 2004.

Step 2. Considering each estimate, a RMSE coming from the training data have been associated and a new mean has been calculated by considering only the first three values which have the lowest RMSE.

Step 3. To the two mean values calculated at point 1 and 2, a correction factor is applied which gives more weight to the mean value with the lowest variance. Furthermore, in case of presence of high values of SM, the results from the Bayesian approach in L band has been used as it is the only approach which is able to detect high values of SM.

The idea of the procedures originates from the ability of the different procedure and configurations to detect some specific soil and vegetation characteristics.

The output of these procedures has been reported in table 12.

Ensemble | R2 | RMSE |

VWC | 0.66 | 1.20 (kg/m2) |

SM | 0.83 | 0.03 (cm3/cm3) |

The results reported in table 12 indicate a notable improvement in the estimation of both SM and VWC considering the R^{2} between measured and estimated values. For the RMSE, the improvement is evident especially for SM, while for VWC the ensemble RMSE is similar to the one found for the Bayesian approach considered as a single learner. Anyhow, the VWC values were already quite well estimated from the single approaches, and then the ensemble approach is not expected to improve much more the estimation as revealed comparing both R^{2} and RMSE (Brown et al, 2005) On the other side, it is interesting to highlight the information for SM that has been extracted from the single learners and that contribute to determine the better estimates. The results of the ensemble technique are illustrated in figure 2 for VWC and in figure 3 for SM. In each figure, there are four graphs where the results from the three approaches with the highest accuracy and the ensemble results are reported.

## 7. Conclusions and future applications

The main aim of this chapter is to illustrate the application of some standard inversion procedures, an empirical and a Bayesian approach for the estimation of VWC and SM from radar images in cases of densely vegetated fields. In this analysis, the presence of vegetation determines a strong disturb to the evaluation of SM. Both methodologies make use or are related to the formulation of theoretical electromagnetic models such as IEM for bare soils and WCM for vegetated fields. The approaches have been applied considering one frequency channel, C or L band and their combination. In all the case, both co-polarized channels, HH and VV, have been used. Subsequently these single learners have been considered as members of an ensemble and a procedure mainly based on the variance minimization has been applied to derive the best estimates.

Results from the single learners indicate that for VWC:

- The algorithms are able to detect three main ranges: from 0.0 to 1.0 Kg/m^{2}, from 1.0 to 3.0 Kg/m^{2} and values higher than 3 Kg/m^{2}.

The Bayesian approach determines the best estimates especially in terms of RMSE.

In the case of Bayesian approach both C and L band can provide reliable estimates with high correlation coefficients and low RMSE values.

While for SM:

The empirical approach works better if the extreme value of SM are excluding from the computation of R

^{2}and RMSE. This demonstrates the ability of the approach to determine an average SM status, but in case of extreme situations such as very low or high values of SM, the algorithm is not enough sensitive to these values and able to disentangle the vegetation effect from the radar signal.Also the Bayesian approach is sensitive to this problem even in a minor way. In fact the estimates improve if some anomalous SM values are eliminated. These SM high values are not correlated to high values of backscattering coefficients or VWC.

In the Bayesian approach, the different use of C and L band emerges if soybean and corn fields are analyzed separately. In this case, for the corn fields, only the combination of C and L band can provide estimates with acceptable R

^{2}and RMSE. For soybean fields, good results are determined by both C band and the combination of C and L band.

These analyses are the starting point from which the ensemble part derives. It is clear that there is not a unique method which can provide reliable estimates for all types of soil condition in terms of vegetation and SM status. These are due to the limitation of the method itself, for example generally the empirical approaches are quite site specific, but in some cases, each method or sensor configuration is able to detect some specific characteristics and is insensitive to some others.

The ensemble approach used in this work considers the single estimates and determines the best estimates based on an approach which aims at minimizing the variance in an iterative way. Results from the ensemble learner indicate:

For VWC the improvement is not so evident, even because the single estimates were already good enough.

The net improvement is evident for SM, where diverse capability of each single learner to detect specific SM condition (e.g. Bayesian approach L band was the only one able to predict high values of SM) emerges.

For further validation, this new procedure will be applied to other data sets and enriched with other inversion techniques.