Comparison of user accuracy (UA), producer accuracy (PA), overall accuracy (OA), and kappa coefficient (in the lowest raw) for different filters for SAR data in 2016.
This chapter addresses an important practical task of classification of multichannel remote sensing data with application to multitemporal dual-polarization Sentinel radar images acquired for agricultural regions in Ukraine. We first consider characteristics of dual-polarization Sentinel radar images and discuss what kind of filters can be applied to such data. Several examples of denoising are presented with analysis of what properties of filters are desired and what can be provided in practice. It is also demonstrated that the use of preliminary denoising produces improvement of classification accuracy where despeckling that is more efficient in terms of standard filtering criteria results in better classification.
- dual polarization
- radar image
- agricultural area
Remote sensing (RS) systems have become tools of intensive everyday use for numerous applications in ecology, forestry, oceanology, agriculture, and disaster monitoring [1, 2, 3, 4, 5]. This is due to significant advances and achievements in science and technology. Firstly, imagers’ characteristics such as spatial resolution, swath width, receiver input signal-to-noise ratio, bandwidth and throughput, and near real-time data delivery have improved considerably. Secondly, it has become possible to process more data both onboard and on-land in real time or quickly enough. Thirdly, many existing spaceborne RS systems provide data with frequency needed (convenient) for performing monitoring for scenes under interest.
In particular, such monitoring is possible using RS data by Sentinel-1 A/B two-polarization synthetic aperture radar (SAR) that has been put into operation recently . There are several positive features of data (images) provided by this SAR. Firstly, radars are known to be able to operate well (to acquire images) during day and night in almost all weather conditions [7, 8]. Secondly, polarization radars (Sentinel SAR produces VV and VH polarization images) offer more opportunities for effectively solving different classification tasks than single-channel radars [8, 9]. Thirdly, Sentinel SAR is characterized by spatial resolution about 10 m, and data are available free of charge for noncommercial use. Finally, Sentinel satellites carry out global coverage with 6-day revisit frequency.
All together, these facilities of Sentinel SAR provide excellent prerequisites for solving tasks of agricultural monitoring where multitemporal radar images can be employed either separately or jointly with other types of images including optical and infrared ones [10, 11, 12]. However, radar images also have certain drawbacks. One of them is the presence of noise-like phenomenon called speckle [7, 13]. Although speckle intensity and probability density function (PDF) can be different depending upon the number of looks and other factors [7, 14, 15], speckle is the main factor that deteriorates radar image quality and prevents their efficient processing [16, 17] and classification .
Taking this into account, speckle noise has to be reduced. Speckle removal (despeckling) is not an easy task. There are several known peculiarities of speckle. Firstly, it is supposed to be pure multiplicative noise [7, 13]. Secondly, it has PDF that is usually non-Gaussian [7, 16]. Thirdly, speckle often possesses spatial correlation  that has to be taken into account.
There are many existing image processing software packages, which allow performing radar image pre-filtering such as ESA SNAP toolbox, ENVI, etc. In particular, they provide such good filters as Frost, Lee, refined Lee, etc. [6, 13]. However, there are three aspects worth taking into consideration. Firstly, selection of a proper filter and its parameters (scanning window size, thresholds, etc.) should be done based on careful analysis of speckle properties for SAR data supposed to be used (Sentinel SAR data in our case). Secondly, there are good despeckling filters that have not been yet implemented in toolboxes including filters based on discrete cosine transform (DCT) and/or that belong to a new family of nonlocal filters [20, 21]. Thirdly, efficiency of filtering should be assessed with respect to a final task as crop or agricultural area classification in our case where type of data, methodology of classifier learning, and other factors have essential impact on final classification .
The goal of this chapter is threefold. Analysis of speckle characteristics has to be carried out, and the obtained data can be considered as a prerequisite for choosing a proper technique (or several possible methods) of radar image processing. Performance of different despeckling methods has to be evaluated and compared where emphasis should be done on modern techniques. Besides, our intention is to assess the impact of filtering techniques on crop classification accuracy and provide practical recommendations.
2. Image and noise model and its parameters
In this book chapter, we focus on processing Sentinel SAR images. It is known from the literature that the dominant factor degrading SAR image quality is speckle which is supposed to be a specific type of pure multiplicative noise [7, 14], i.e.:
where denotes the ijth sample of the kth component for a considered multichannel image, is the ijth value of the multiplicative noise in the kth component image, is the true value for the kijth pixel, I and J define the data size, and K denotes the number of components (K = 2 for two-polarization data that are mainly considered below). Random variable is supposed to have unity mean and variance that, in general, can be different for different component images.
To the best of our knowledge, the assumption on pure multiplicative nature of speckle in Sentinel images has not been tested. Other properties of speckle such as spatial correlation have not been thoroughly analyzed yet as well. Thus, we have performed a preliminary analysis. First, our study has been performed for manually selected image fragments. Examples of such fragments (all of rectangular shape) are shown by frames for vertical-vertical (VV) and vertical-horizontal (VH) polarization components of SAR image fragments presented in Figure 1. Analysis has confirmed the assumption that speckle is pure multiplicative. The estimated value of its relative variance is about 0.05 for both components (k = 1 and 2), i.e., we deal with multi-look data. Automatic blind estimation  has resulted in almost the same estimates of that varied only slightly (no more than 10–15% for 19 analyzed VV and 19 studied VH images) (Figure 2).
Visual inspection of images in Figure 1 also shows that they are similar and different simultaneously. The objects might have different contrasts with respect to background or neighbor objects. Some objects can be present in image with one polarization and be absent in image of other polarizations. For example, there is a rectangular shape high-intensity object in the central part of VV image (Figure 1, left) that is not observed in VH image (Figure 1, right).
Cross-correlation factor values for VV and VH polarizations (under condition that they are jointly registered or superimposed with high accuracy) are about 0.8. Speckle has been found practically uncorrelated between component images. These properties will be taken into account in consideration of filters to be applied.
Another important property of speckle is its spatial correlation. There are different approaches to describe and analyze it. One approach is to obtain 2D autocorrelation functions in homogeneous image areas [7, 22]. Under condition of a priori known , such areas can be detected by comparing relative variance (calculated by Eq. 2) to . Here, it is supposed that the considered area is of rectangular shape defined by indices in one direction and in the other direction; is threshold that can be set approximately equal to 1.2; is the local mean (calculated by Eq. 3):
Below we characterize speckle correlation in another manner, using spectrum in DCT domain. The reasons for this will become clear later, in Section 3. Suppose that we have found N homogeneous image blocks of size 8×8 pixels, i.e., blocks for which
where is the local mean for the nth block. 2D DCT is then calculated for each nth block with obtaining where lm are indices of spatial frequencies, l = 1 and m = 1 relate to DC (direct current) term proportional to the block mean, and alternating current (AC) DCT components with larger l and m relate to higher spatial frequencies. Power spectrum shape estimate is obtained through Eq. 5. Then, the normalized DCT spectrum is obtained through Eq. 6:
where, in fact, is not of interest for us since it is not taken into account in denoising (see the details in Section 3). There exist also other methods of estimation  including blind methods [24, 25].
The obtained estimates are presented in Figure 3 for both polarization components. Bin heights that reflect spectrum values clearly demonstrate that speckle is spatially correlated (spectrum values are considerably smaller for higher spatial frequencies). These correlation properties are quite similar for VV and VH polarizations as well as for different images analyzed.
Thus, it is possible to state that speckle in Sentinel SAR images is practically pure multiplicative with . Its distribution is not Gaussian—this has been shown by Gaussianity tests. Speckle is spatially correlated and practically independent for different polarizations. All these properties should be taken into consideration at image processing stage.
3. Filters and their impacts on images
First of all, let us recall requirements to filters applied to radar image processing. The main requirement is, certainly, effective suppression of speckle noise. However, edge/detail/texture preservation is important as well. Additional requirement is to retain mean level in homogeneous image regions (some filters do not perform this automatically, without specific correction ). This is important since mean in homogeneous regions is strictly connected with calibration . Finally, computational efficiency is important, especially in the considered application since standard size of Sentinel SAR images we deal with is large, ~25k × 16k pixels. Note that some requirements are contradictory . In particular, it is difficult to provide a trade-off between speckle suppression and edge/detail/texture preservation.
If we deal with multichannel images (multispectral, hyperspectral, multipolarization ones), there are two main approaches to their filtering . The first is component-wise processing, i.e., if each component image is processed as single-channel image . This approach is, in general, simpler. There exist more methods. It is easier to take into account speckle properties including its spatial spectrum. Processing of multichannel images can be done in parallel.
The second approach is 3D (vectorial) processing [20, 29]. This group of filters is able to take correlation of image components into account. This, under certain conditions, allows improving the efficiency of noise suppression. However, these methods can run into problems of different properties of noise (noise variance can be not the same, spectrum can be not identical, etc.). Then, either special methods of component image preprocessing are needed , or the use of 3D processing benefits in full extent becomes impossible.
Below we consider representatives of both groups of methods. All studied filters are based on DCT. The reasons for this are the following. Firstly, DCT is one of the best data decorrelating transforms approaching Karhunen-Loeve transform. This property is important in denoising and data representation. Secondly, DCT denoising is carried out in sliding blocks . Due to this, it is easily adapted to signal-dependent noise  and its spatial correlation . Thirdly, 2D DCT is a standard operation in image processing (in particular, in compression), and its efficient software and hardware implementations exist.
Thus, in addition to known despeckling techniques available in ESA SNAP toolbox, we also study filtering techniques based on DCT. For all modifications considered below, the main principles of processing are the same. Denoising is carried out in blocks of square shape, usually of size 8×8 pixels although other variants are possible. In each block, direct 2D DCT is performed. Then, amplitudes of obtained DCT coefficients are compared to thresholds, and one or another type of threshold operation is accomplished (the coefficient with indices (1,1) is remained unchanged in any case). Then, inverse 2D DCT is done, and as the result, filtered values are obtained for all pixels that belong to a given block.
There are different modifications of DCT-based denoising that concern block overlapping and threshold type. Denoising can be done with nonoverlapping, partly overlapping, and fully overlapping blocks. In the latter case, neighbor blocks are shifted by only one pixel with respect to each other in horizontal or vertical direction. Processing with full overlapping of blocks takes more time compared to processing with partly overlapping or nonoverlapping blocks. But the use of fully overlapping blocks provides the best efficiency of filtering in terms of standard quantitative criteria as peak signal-to-noise ratio (PSNR) and visual quality metrics. So, we will further employ this variant of DCT-based denoising. Note that for most image pixels, one has 64 filtered values coming from different block positions. They are averaged (although other methods of aggregation are possible).
There are hard, soft, and combined types of threshold. Below we employ the former type since hard thresholding is the most simple and efficient. This means that if a DCT coefficient is larger than a corresponding threshold, then it is remained unchanged. Otherwise, a zero value is assigned.
Multiplicative nature of speckle and its spatial correlation can be taken into consideration in different manners. Consider possible variants more in detail. If processing is component-wise, signal-dependent (locally adaptive) and frequency-dependent thresholds can be used. The local thresholds in an nth block are determined as
where is local noise standard deviation for the nth block that can be approximately calculated as (index k is omitted for simplicity) and is the filter parameter usually set equal to 2.7 for hard thresholding . This approach to filtering will be further referred as Filter 1.
Another way of component-wise processing mentioned earlier is based on utilizing homomorphic or variance-stabilizing transforms. A general form of logarithmic-type homomorphic transform (HT) is [7, 33] where c and d are transform parameters that are both larger than unity. These parameters can be chosen in different ways depending upon certain conditions. The main aspect here is that in multilook SAR images subject to such a HT, speckle converts to pure additive noise with non-Gaussian PDF. Its variance can be calculated as .
After this direct HT, a DCT-based filter version for additive noise is applied. Since additive noise is spatially correlated as well and its normalized DCT spectrum is practically the same as for original data, frequency-dependent thresholds are determined as
where the recommended is the same as earlier. After filtering, inverse HT is performed, and it is desired to carry out mean level correction . This variant of filtering will be further referred as Filter 2. According to our studies, it performs slightly worse compared to Filter 1. One reason is that additional distortions are introduced by direct and inverse HTs.
Joint processing of two component images that we plan to perform for the considered application can be carried out in two ways. The first one  is to use one component as basic one and to perform data correction for the second component image with providing equal mean values for each block of these images. Then, 3D DCT is performed in each block, and is set according to (4) where from the basic image is used for its calculation.
Another, practically equivalent, variant is to carry out variance-stabilizing transforms separately for each component images in such a manner that . If as we have in our case, parameters of logarithmic transform c and d can be the same. After this, 3D is done in each block, then thresholding follows, after this inverse 3D DCT is carried out. Due to separability of 3D transform into two-element 1D DCT and 2D DCT, obtaining of sum and difference images is possible. Then, their separate processing is applied with further inverse two-element DCT for each pixel. Threshold (8) is used in this case. The described denoising method is further called Filter 3.
There is one condition that should be satisfied for both presented 3D filtering techniques. Normalized spectra for component images should be the same or, at least, almost identical. Preliminary analysis carried out in Section 2 has demonstrated that this condition is satisfied. So, we have averaged estimates of normalized spectra obtained for all analyzed fragments of 19 VV and 19 VH radar images.
Then, the proposed methods of efficiency prediction are able to either predict the ratio where
and is the kth component image after despeckling or improvement of PSNR (IPSNR) equal to [35, 36]. Experiments carried out with 512×512 pixel fragments of Sentinel SAR images have shown that IPSNR for Filter 1 varies within the limits from 5 to 12 dB, i.e., images enhance sufficiently.
Let us prove this by two examples. Figure 4a presents original image with VV polarization. The output of Filter 1 is shown in Figure 4b. In turn, Figure 4c represents the output of Filter 2. Finally, Figure 4d shows the output of the modification  applied within the scheme  instead of standard BM3D (block matching three-dimensional) filter. Note that this is only an example for the latter filter applied to image fragment of limited size. We have not applied it to processing large-size Sentinel data further subject to classification since this filter requires too much time for image denoising.
The filtering results represented in Figure 4 look quite similarly. Efficient speckle suppression is provided, while edge/detail preservation is good enough. Visual inspection carried by us and quantitative data in  show that the output of Filter 2 is the most smeared and some artifacts are present. Due to linearity nature of the main used computational procedures (DCT) of the considered filters, all computational costs can be easily reduced using parallel-wised algorithms or hardware. On the following computer configuration—Intel Core i7, 16 GB RAM—the entire calculation time for one image (recall that average size of images is ~25k × 16k pixels) is about 17 min. We also note that such time can be reduced by realizing the considered denoising methods in low-level programming language due to filter simplicity.
We have also analyzed filter outputs for 3D despeckling methods. One example is given in Figure 5. Original image of VV polarization is presented in Figure 5a. Again, a good compromise between speckle suppression and edge/detail preservation is provided.
However, visual analysis is not able to draw conclusions what filter will suit the final goal of reaching high classification accuracy in the best way. Thus, three denoising techniques (Filters 1, 2, and 3) described above have been used to improve SAR image quality before classification. Note that for multitemporal SAR data, more sophisticated 3D filtering methods can be applied, but our task here is to establish connection between filtering efficiency and classification accuracy. We expect that the use of Filter 2 and Filter 3 will produce comparable or better classification than Filter 1 and other filters available in known toolboxes.
4. Classification of multichannel radar images
There are many classifiers that can be applied. Popular ones such as support vector machine (SVM), decision tree (DT), and random forest (RF) classifiers were often employed for remote sensing image classification in the recent years. Many papers have demonstrated comparable or better performance of neural networks (NN) [37, 38, 39, 40]. Although the NN training can be resource- and time-consuming, the corresponding classifiers have several positive features compared to SVM and DT. In particular, NNs are usually fast at processing new data that is important in processing of large volumes of radar data. Besides, they are able to produce probabilistic outputs, which can be employed for characterizing reliability (quality) of the final results (classification map). Hence, for classification, we have used a specific type of NN—an ensemble of feedforward neural networks, namely, multilayer perceptrons (MLPs). This type of classifiers was validated earlier for five Joint Experiment of Crop Assessment and Monitoring (JECAM) test sites in several countries, and it outperformed other considered techniques [41, 42]. The architecture of the used approach is represented in Figure 6. A committee (ensemble) of MLPs was used to improve performance of individual classifiers [9, 10]. The committee was obtained using a variant of the bagging technique  where MLPs with different parameters were trained using the same training data. This approach has several advantages. It is rather simple, non-computationally intensive and proved to be efficient for various applications. Outputs from the used MLPs were integrated using the method of average committee. According to this technique, the average class probability over all elementary classifiers is calculated, and the class that has the highest average posterior probability for the given input sample is chosen . The following equation describes this procedure:
where is the class to which the input sample is assigned by the committee of classifiers, denotes the resulting posterior probability of the committee, is posterior probability for each of MLPs, Q is the total number of classifiers in the committee, and C denotes the number of classes. There are two differences between average committee procedure and a majority voting technique: (i) the former approach gives probabilistic output, which can be employed as an indicator of reliability for classifying a particular pixel or area; (ii) there is no ambiguity when two or more classes have got the same number of “votes.”
Let us give some details. Our MLP classifier used hyperbolic tangent activation function for neurons in the hidden layer and logistic activation function in the output layer. We utilized a cross-entropy (CE) error function for training our neural network. This provided better performance in terms of speed of training and classification accuracy [9, 44]:
where E(w) denotes the CE error function that is dependent on the neurons’ weight coefficients w, T is the set of vectors of target outputs in the training set that is composed of S samples, and tsc and ysc denote the target and MLP outputs, respectively. In the target output for class c, all components of vector ts are set to 0, except for the cth component, which is set to 1. The CE error E(w) had to be minimized. This was done by the scaled conjugate gradient algorithm by varying weight coefficients w. Backpropagation method was utilized for NN training. To prevent NN overfitting, we used early stopping and weight decay (L2 regularization) techniques.
Classification of multitemporal radar images was performed on a per-pixel basis . We have considered all SAR images available during the crop growth period. Note that the use of multitemporal images allows more accurate classification of crops than a single-date image.
5. Classification results
Filtering approaches for Sentinel-1 images were verified in the south part of Ukraine (yellow rectangle in Figure 2). Zoomed results after different filters are shown for fragment (marked by star) in Figure 2 that has coordinates (33.73052, 47.18544). Such a territory is an intensive agricultural area; it corresponds to different climatic zones (humid continental) and has many different land cover types (forest on the north and agricultural crop in the southern part). The crop calendar is September–July for winter crops and April–October for spring and summer crops. A typical field size is 30–250 ha.
We preprocessed time series of 10 Sentinel-1A images with different filters available in ESA SNAP toolbox and Filter 1, Filter 2, and Filter 3. The comparison of those filters with non-filtered image for July 7, 2016, is shown in Figure 7. It is not easy to evaluate quality of filtering visually (user’s opinions can be subjective). Therefore, we provide crop classification maps based on the same in situ data for time series of images obtained by each filter. For each time series of images, we independently trained an ensemble of neural networks. For this, we collected 153 data samples for nine classes, and all the accuracies were evaluated on independent set, which consisted of 146 samples for nine classes. Under the term “sample,” we assume one polygon that consists of homogeneous pixels and all of them relate to the same class.
For ground-truth data collecting, we have used the along the roads approach. During these trips, we have collected georeferenced type of crops for definite fields. After that, collected data were divided into two independent datasets: one was used for training and another one for validation (as ground-truth data). So, for training and validation, we have used independent datasets with 153 and 146 samples, respectively.
The obtained crop classification maps after applying different filters are shown in Figure 8. In Tables 1 and 2, we present the comparison of user accuracy (UA), producer accuracy (PA), overall accuracy (OA), and Kappa coefficient for all classes  from crop classification map using different filters for SAR data in 2016.
|Class||No filter||Boxcar||Frost||Gamma map||IDAN||Lee|
|UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %|
|Class||Lee sigma||Median||Refined Lee||DCTF||DCTF HT||3D DCTF|
|UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %||UA, %||PA, %|
The overall accuracy of the crop classification map without applying any filtering is 82.6%. The lowest accuracy among all the available filters is provided by Median filter; classification accuracy is higher by +3.2% than classification of original (non-filtered) data. The proposed Filter 1 and Filter 2 have similar results with a nonsignificant gain of +0.3 and +0.1% compared to refined Lee, respectively. The most accurate crop map was obtained based on images preprocessed by the Filter 3. The gain of using this method is +6.1% compared to classification of original data and +1.3% compared to the refined Lee filter. It is worth noting that Filter 3 positive gains were observed not only for overall accuracy but also for PA and UA for each class, excluding forest and bare land.
One aim of this study was to analyze properties of Sentinel SAR images. It has been shown that speckle is quite intensive and spatially correlated. These peculiarities have been taken into account in choosing proper filters for data preprocessing.
Another aim was to compare the impact of different filters for SAR imagery denoising on crop classification accuracy. Ten SAR images with VV and VH polarization acquired by Sentinel-1A satellite for the Ukraine territory were explored. For speckle suppression, all available filters from the ESA SNAP toolbox were evaluated along with those based on DCT, namely, DCTF, DCTF HT, and 3D DCTF. Filter quality estimation has been performed in terms of overall crop classification accuracy. An ensemble of feed forward neural networks, in particular MLPs, was used as this method was validated for multiple JECAM test sites and proved to be efficient for multitemporal image classification.
The best performance using available filters in SNAP toolbox in terms of overall accuracy was achieved by the refined Lee filter. At the same time, all proposed DCT-based filters outperformed common filters from the SNAP toolbox. 3D DCTF provided the most accurate crop classification map after removing the speckle, and that is essential for crop area estimation and for solving other applied problems. It should be noted that 3D DCTF improved not only overall accuracy but also PA and UA for each class (especially for summer crops), excluding forest and bare land.