Goodness of fit (*R*^{2}) of the obtained approximations

## Abstract

In processing of multichannel remote sensing data, there is a need in automation of basic operations as filtering and compression. Automation presumes undertaking a decision on expedience of image filtering. Automation also deals with obtaining of information based on which certain decisions can be undertaken or parameters of processing algorithms can be chosen. For the considered operations of denoising and lossy compression, it is shown that their basic performance characteristics can be quite easily predicted based on easily calculated local statistics in discrete cosine transform (DCT) domain. The described methodology of prediction is shown to be general and applicable to different types of noise under condition that its basic characteristics are known in advance or pre-estimated accurately.

### Keywords

- Multichannel remote sensing data
- automatic processing
- denoising
- lossy compression
- performance prediction
- DCT

## 1. Introduction

Remote-sensing (RS) data are widely used for numerous applications [1], [2]. Primary RS images acquired onboard of airborne or spaceborne carriers and intended for Earth surface monitoring are usually not ready for direct use and, thus, are subject to a certain preprocessing. This preprocessing can be carried out in several stages and includes the following operations: geo-referencing and calibration, blind estimation of noise/distortion characteristics, pre-filtering, lossless or lossy compression, [1], [2], etc. These operations can be distributed between onboard and on-land computer means (processors) in different ways depending upon many factors [3-5].

Regardless of the distribution of functions, the operations onboard are usually performed in a fully automatic manner (although there can be some changes in algorithm parameters by command passed from Earth). In turn, the operations carried out on land can be, in general, performed in an interactive manner and labor of highly qualified experts is exploited for this purpose. However, a certain degree of automation of on-land data processing is required as well. The need in processing automation is especially high if one deals with multichannel (e.g., hyperspectral) RS data [6], where the number of channels (components, sub-bands) can reach hundreds. Such RS images have become popular and widespread (available) currently due to their (potential) ability to provide rich information for various applications [6], [7].

Meanwhile, the multichannel nature of RS data results in new problems in their processing [3], [8]. The main problems and actual questions are the following:

How to manage large volumes of acquired data with maximal or appropriate efficiency (here, different criteria of efficiency can be used)?

Is it possible to skip some operations of data processing if their efficiency is not high and, consequently, if it is not worth performing them?

The latter question can be mainly addressed as mentioned below. It is strictly connected with other questions as follows:

Is it possible to predict the performance of some standard operations of RS data (image) processing?

What is the accuracy of such a prediction and is this accuracy high enough to undertake a decision to skip carrying out an operation or to set a certain value of some parameter used in the image-processing chain [9]?

This chapter will focus on two typical operations of multichannel RS data processing, namely, filtering and lossy compression. While considering them, the fact that the acquired images are noisy is taken into account. One can argue that noise is not seen in many RS images (or components of these images). This is true, and noise cannot be observed in approximately 80% of the visualized sub-band images of hyperspectral data. This is explained by the peculiarities of human vision, which does not see noise if peak signal-to-noise ratio (PSNR) in a given single-channel (component) image exceeds 32–38 dB. However, recent studies [7], [10-12] have demonstrated that noise is present in all sub-band images and this is due to the principle of operation of hyperspectral imagers.

Moreover, it has been shown in [10], [11] that noise is (can be) of quite a complex nature and the noise acquired in multichannel RS images has specific properties. First, it is signal-dependent [10], [11], [13]. Second, it is of essentially a different intensity (see Abramov et al., 2015 in [14]). More precisely, the wide variation of dynamic range and noise intensity in sub-band images jointly leads to wide limits of signal-to-noise ratio (SNR) in components of multichannel images. This has led to the use of the term “junk bands” [15] and different strategies of coping with noisy channels in multichannel data. Some researchers prefer to use these sub-bands in further processing while others propose to remove them; it is also discussed whether they can be filtered or not [15]. It has been shown that if filtering of these junk bands is efficient, this can improve the classification of hyperspectral data [16]. However, the aforementioned questions concern the efficiency of image preprocessing and its prediction.

The questions raised can be partly answered with the results obtained in recent research. The objective is to show that important performance parameters of image denoising and/or lossy compression can be quickly and quite accurately predicted using simple input parameter(s) and dependences obtained in advance. The obtained results are divided into two parts. The first part deals with the prediction of filtering efficiency. This research has started in 2013 [17] and has its history in a study conducted in [18]. The second part relates to the compression of noisy images [19], [20]. In fact, the results obtained for predicting the parameters of lossy compression can be treated as based on the same principle as that for image filtering and for further research.

Before taking the image performance criteria and preprocessing techniques into consideration, it is important to note the following: first, there are two hypotheses. It is supposed that noise type is known or determined in advance. It is also assumed that its parameters are either known or accurately pre-estimated. It is to be noted that, currently, there are quite a few efficient methods for estimating the parameters of pure additive noise [8], [21-25], speckle noise [26], and different types of signal-dependent noise [10-12], [27], [28]. The noise parameters are taken into account by the most modern filtering techniques that belong to the families of orthogonal-transform-based filters [29-33] and nonlocal filters, for example, block-matching and three-dimensional filtering (BM3D) [34]. The same relates to modern methods of lossy compression of noisy images [19], [35].

Second, we restrict ourselves to consider the image- filtering and compression techniques based on discrete cosine transform (DCT). This is explained using several reasons. DCT is a powerful orthogonal transform widely exploited in image processing. Filters and compression techniques based on DCT are currently among the best [34]. They can be quite easily adapted to the signal-dependent noise directly [32], [36] or equipped with proper variance-stabilizing transformations (VST) [19], [32], [37]. This restriction does not mean that the approach to prediction cannot be applied to other filtering and lossy compression techniques. This approach should be applicable (with certain modifications) but is yet to be thoroughly checked.

Third, in the analysis of the prediction approach, traditional quality metrics are employed such as mean square error (MSE) and peak signal-to-noise ratio (PSNR), as well as some visual quality metrics such as PSNR human visual system masking metric (PSNR-HVS-M) [38]. Behavior and properties of traditional metrics are understood well by those dealing with image processing. Although PSNR-HVS-M is less popular, this is one of the best metrics that takes into account the peculiarities of human visual system (HVS) and that can be calculated for either one component of a multichannel image or a group of components of a multichannel image. It is expressed in dB, and it is usually either slightly smaller than PSNR (for annoying types of distortions like spatially correlated noise) or larger than PSNR (if distortions are masked by texture). This is important since we assume that the processing of multichannel images is carried out either component-wise or in groups of channel images, where a group includes the entire image in marginal case.

Fourth, other criteria of image-processing efficiency, such as classification accuracy, object detectability, etc., are important for the preprocessed RS data. We are unable to predict them, but recent research shows [39] that these criteria are connected with the traditional criteria of image processing. Thus, it is expected that if good values of conventional and HVS metrics are provided due to preprocessing, appropriate classification accuracy and other criteria will be attained.

## 2. The considered image-performance criteria and preprocessing techniques

This chapter considers the following model of an observed multichannel image:

where *k*-th component of a multichannel image, *ij*-th value of the noise in *k*-th component statistic, which is, in general, supposed to be dependent on the true image value *I* and *J* define the image size, and *K* denotes the number of channels. It is also assumed that the images *k*-th channel image, respectively. It is also possible to assume that noise is of the same type and neighbor channels have quite close values of input MSEs (equal to noise variance

and input PSNR

The same assumptions are valid for input

After applying a considered filter, one obtains a filtered image

Output

Then, one has to characterize the efficiency of filtering. One way to do this is to use

Small values of the ratio in expression (6) and large values of expressions (7) and (8), both expressed in dB, are evidence in favor of efficient filtering.

Similarly, after lossy compression, one obtains

occurs to be less than

on QS for the lossy DCT-based coder AGU [42] for two known gray-scale test images Airfield (Fig. 1(b)) and Frisco (Fig. 1(c)) corrupted by additive white Gaussian noise (AWGN) with variance

The lossy compression in the neighborhood of OOP has obvious advantages. Compressed images have high quality, and, at the same time, they have CR considerably larger than for lossless compression [9], [44]. Because of these benefits, the lossy compression of noisy images in the OOP neighborhood is considered. If OOP does not exist, nevertheless, the recommended setting

where

Certainly, there are also other valuable performance criteria. For image pre-filtering, it is important to know the computational efficiency of the denoising method and how easily it can be implemented, especially onboard. For image lossy compression, it is important to know CR provided and how easily it can be attained. To partly address these issues, the filtering and compression techniques are briefly described.

DCT-based filtering [18], [30] is performed in a block-wise manner, where 8 × 8 pixels are a typically set block size. Filtering can be performed with nonoverlapping, partly overlapping, and fully overlapping blocks. In the latter case, filtering efficiency (expressed in improvement of PSNR (

There are three main steps in processing: direct 2D DCT in each block; thresholding of DCT coefficients; inverse DCT applied to thresholded DCT coefficients; then, the filtered data from overlapping blocks are aggregated. Within this structure, different variants of thresholding are possible but employing hard thresholding is preferred, where DCT coefficient values remain unchanged if their amplitudes exceed a threshold or are assigned zero values otherwise. If one deals with AWGN, the threshold is set fixed as

For spatially uncorrelated signal-dependent noise with *a priori* known or accurately pre-estimated dependence of local standard deviation on local (block) mean

Finally, for spatially correlated and signal-dependent noise with *a priori* known or properly pre-estimated normalized DCT spectrum *qs* are indices of DCT coefficients in blocks [33], the thresholds are locally adaptive and frequency dependent:

In expressions (14–16), *β* is the parameter. Depending upon the image complexity and noise intensity, its optimal value can vary a little [18], but the recommended choices are *β* = 2.6 to provide good filtering according to *β* = 2.3 to ensure quasi-optimal denoising according to *β* =2.6 will be used. A 3D version of the DCT–based filter [39] performs similarly. The difference is that the blocks are 3D, of size 8 × 8 × *K*_{gr}, where *K*_{gr} ≤ *K* denotes a channel group size.

Conventional BM3D [34] is a more sophisticated denoising method. It presumes search for similar patches (blocks), with their joint processing in a 3D manner using DCT and Haar transform, and post-processing stage. This filtering principle, originally designed to cope with AWGN in gray-scale images, has been later adapted to the cases of signal-dependent noise after a proper VST [37], spatially correlated noise [45] and color (three-channel) images corrupted by AWGN [46]. The BM3D and its modifications provide a slightly better performance than the corresponding modifications of the conventional DCT-based denoising by the expense of considerably more extensive computations.

The lossy compression technique called AGU [42] is based on DCT in 32 × 32 pixel blocks, a more efficient (compared to JPEG) coding of quantized DCT coefficients and post-processing to remove the blocking artifacts after decompression. This coder is quite simple but slightly more efficient than JPEG 2000) or set partitioning in hierarchical trees (SPIHT) in rate/distortion sense. This coder has 3D version [19] and CR for both 2D and 3D versions is controlled (changed) by QS.

## 3. Prediction of filtering efficiency

The main idea of filtering efficiency prediction is the following [17]. Suppose there is some input parameter(s) able to jointly characterize image complexity and noise intensity and also there is some output parameter(s) capable of adequately describing the image denoising efficiency. Assume that there is a rather strict connection between these input and output parameters that allows predicting output value(s) having input value(s).

An additional assumption (and requirement to prediction) is that input parameter(s) have to be calculated easily and quickly enough, faster than denoising itself (otherwise, the prediction becomes useless). If all these assumptions are valid, it becomes possible to determine a predicted output value before starting image filtering and to decide whether it is worth filtering a given image (component) or not. Another decision can relate to setting parameter(s) of a used filter. For example, if a processed image seems to be textural (having high complexity), parameter(s) of a used filter can be adjusted to provide better edge/detail/texture preservation. For example, the parameter *β* for the DCT-based filter can be set equal to 2.3.

Keeping these general principles in mind, we have to address several tasks:

What is a good (in the best case, optimal) input parameter (or a set of parameters)?

What is a good (proper, acceptable) output parameter (or a set of parameters) that allows to characterize the filtering efficiency adequately and to undertake a decision (on using filtering or not, on setting a filter parameter, etc.)?

How to get dependence between output and input parameters and how accurate it is?

These questions are partly answered below and the outcomes obtained in design and performance analysis of prediction techniques are described. We believe that a partial answer to the second question is the following. The ratio in expression (6) as well as the parameters

### 3.1. Input and output parameter sets testing and comparison

Based on the outcomes of the study [18], Abramov et al. in 2013 [17] observed that there is dependence between efficiency of filtering expressed by (6) and simple statistics of DCT-coefficients determined in 8 × 8 blocks. Two probability parameters have been considered. The first one denoted as *P*_{2σ} is the mean probability that the amplitudes of DCT coefficients are not larger than 2*σ,* where *σ* denotes the standard deviation of additive white Gaussian noise. This parameter originated from analogies with known sigma filter [47]. The second parameter denoted as *P*_{2.7σ} is the mean probability that the amplitudes of DCT coefficients are larger than 2.7*σ*. Here, there is an obvious analogy with hard thresholding in DCT-based filter, where the recommended *β* = 2.7. At the starting point, Abramov et al., 2013 had no idea on the optimality of input parameters. The objective was just to check whether the prediction is possible, in principle, using a restricted set of test gray-scale images (18) and standard deviations of AWGN (5, 10, 15). The data have been presented as scatterplots, where the *Y*-axis reflects the ratio in expression (6) and *X*-axis corresponds to a considered statistical (input) parameter (either *P*_{2σ} or *P*_{2.7σ}). These scatterplots are represented in Fig. 2. Obviously, the scatterplots’ points are clustered well along the fitted lines (for easy fitting, second-order polynomials were used). Interestingly, small *P*_{2σ} and large *P*_{2.7σ} correspond to complex structure images corrupted by low-intensity noise. In this case, efficiency of image filtering is low (the ratio in expression (6) is close to unity, see Fig. 2). Note that this is in agreement with the theory of filtering [48], [49]. It shows that efficient filtering of textural images is problematic for any existing filters including the most sophisticated nonlocal ones [34].

The results of the study conducted in [17] have also shown the following. First, quality of fitting has to be characterized quantitatively. For this purpose, the approach [50] works well. It provides the parameter (coefficient of determination) *R*^{2} that tends to unity for perfectly fitted curves and root mean square error (RMSE) of fitting that should be as small as possible. These parameters are strictly connected with prediction accuracy. For perfectly determined *P*_{2σ} or *P*_{2.7σ}, RMSE of fitting directly describes the accuracy of prediction.

The conclusions drawn in [17] can be recalled here. First, the prediction of filtering efficiency for BM3D is less accurate than for the conventional DCT-based filter. This conclusion has been confirmed in later studies. This is associated with the use of two denoising mechanisms (DCT denoising and similar block search with their joint processing), where the latter mechanism has no connection to DCT statistics. Second, although the prediction accuracy for both *P*_{2σ} and *P*_{2.7σ} is quite good (*R*^{2} > 0.9 and RMSE < 1.0), the probability *P*_{2σ} provides sufficiently better prediction (quality of fitting) than *P*_{2.7σ}. This shows that the use of other input parameters is possible. Third, different types of fitting functions (polynomials, power and exponential functions) were able to provide approximately the same quality of fitting (for example, the fitted curve in Fig. 2(a) is *P*_{2σ} and *P*_{2.7σ} can be determined with appropriate accuracy from analysis of not all possible overlapping blocks but from partly or even nonoverlapping blocks if their total number is not less than 300...500. This additionally accelerates the prediction compared even to conventional DCT-based filtering.

There are also observations understood later (in two recent years). First, there should be some restrictions imposed on the approximating function. For example, it is clear that the ratio in expression (6) cannot be negative. It is also clear that an approximating (fitting) function should be determined for all possible values of its arguments. Since the probabilities serve as arguments, they can vary from zero to unity. Meanwhile, arguments in both scatterplots in Fig. 2 vary in narrower limits. Besides, it could be good for curve fitting to have point arguments with approximately uniform density.

These requirements have been satisfied by using considerably more test images (including highly textural ones) and a wider set of noise standard deviations (including quite small ones). This has allowed obtaining scatterplot points for small *P*_{2σ} and large *P*_{2.7σ}.

Examples of the obtained scatterplots and fitted curves for the DCT-based denoising are shown in Fig. 3. As it is seen, fitting is rather good and coefficient of determination is approximately 0.95 (see the details below). We believe these are already good results that allow practical recommendations. For example, it is clearly seen that there is no reason to carry out filtering if *P*_{2σ} is smaller than 0.5 since the benefit obtained due to denoising is negligible (approximately 1 dB or less). Prediction itself is carried out as follows. Having the fitted curves obtained in advance as described above, it is needed to calculate *P*_{2σ} or *P*_{2,7σ} for a given image before filtering and to substitute it as argument into the approximating function to calculate a desired metric that characterizes the predicted denoising efficiency.

Expressions for the obtained approximations for the DCT filter are as follows (we give only the functions of

The values of *R*^{2} are presented in Table 1. The analysis confirms that it is better to use *P*_{2σ} than *P*_{2.7σ}. Prediction of *κ* is slightly more accurate than the prediction of *IPSNR.* However, the prediction of *IPHVSM* is worth improving.

Metric | P2σ | P2.7σ |

Κ | 0.978 | 0.955 |

IPSNR | 0.963 | 0.935 |

IPHVSM | 0.82 | 0.78 |

It has been discovered that not only the mean of local (block) estimates of probability *P*_{2σ} is connected with predicted metrics [51], but the other statistical parameters of the distribution of local estimates can also be exploited to improve prediction. The general framework to obtain an estimate of a predicted metric by multiparameter fitting is described by the following formula:

where *a* and *b*_{i} are approximation factors, *O*_{i}. *i* = 1,...,*n*, is some parameter of distribution, *n* defines the number of such parameters. As *O*_{i}, it is possible to use the distribution mean, median, mode, variance, skewness, and kurtosis. The factors *a* and *b*_{i}, *i* = 1,...,*n* have to be obtained in advance by multidimensional (*n*-dimensional) regression.

The results of using multidimensional regression are presented in Table 2. The abbreviations used are the following: *M* – mean; *Var* – variance; *Med* – median, *Mod* – mode; *K* – kurtosis; *S* – skewness; all calculated for a set of local estimates of probability *P*_{2σ}. The results are given for both considered filters for the metrics *IPSNR* and *IPHVSM.* Only the best sets for *n* from 1 to 5 are presented since the joint use of all considered parameters is less efficient than five input parameters employed together.

Filter | Metric | Statistical Parameters | R2 |

DCT filter | IPSNR | M | 0.963 |

M, Var | 0.971 | ||

M, Var, Mod | 0.974 | ||

M, Var, Mod, K | 0.976 | ||

M, Var, Med, Mod, S | 0.977 | ||

IPHVSM | Med | 0.848 | |

M, Var | 0.923 | ||

M, Var, Med | 0.926 | ||

M, Var, Med, S | 0.927 | ||

M, Var, Med, Mod, S | 0.928 | ||

BM3D | IPSNR | M | 0.95 |

M, Var | 0.955 | ||

M, Var, Mod | 0.959 | ||

M, Var, Mod, S | 0.961 | ||

M, Var, Med, Mod, S | 0.961 | ||

IPHVSM | Med | 0.845 | |

M, Var | 0.905 | ||

M, Var, S | 0.905 | ||

M, Var, S, K | 0.909 | ||

M, Var, Med, S, K | 0.917 |

The conclusions are the following. The use of more input parameters leads to larger (better) *R*^{2} for both filters and both metrics. The benefit of using several input parameters instead of one is quite small for *IPSNR,* where *R*^{2} for one-parameter prediction is already quite high. Meanwhile, for the visual quality metric *IPHVSM,* the improvement is quite large. Interestingly, the use of median of local estimates instead of the mean considerably improves prediction (compare the data in Tables 2 and 1) for *IPHVSM* for the DCT-based filter and *P*_{2σ}.

More input parameters provide better prediction. At the same time, more time is needed for calculation of input parameters (although their calculation is not difficult). Then, a compromise solution could be the use of the dependence of the type

where

Filter | Metric | a | b1 | b2 |

DCT filter | IPSNR | 0.023 | 6.338 | 7.459 |

IPHVSM | 2.225*10^{−4} | 10.81 | 37.14 | |

BM3D | IPSNR | 0.019 | 6.591 | 6.849 |

IPHVSM | 5.324*10^{−5} | 12.42 | 41.36 |

The expression (20) is not the only way to combine several input parameters into a joint output. Neural networks (NN) are known to perform this task rather well and to be good approximators [52]. This property has been used by us in [53] to make the neural network predict the considered metrics based on multiple input parameters. The obtained results are practically the same as in Table 3. Therefore, there is no need to use a more complex NN approximator instead of expression (20).

A more reasonable solution is to look for better input parameters. Such a study has been conducted in [51]. It has been shown that the probability *P*_{0.5σ} is more informative than *P*_{2σ}, that *P*_{0.5σ} is the mean probability where the magnitudes of DCT coefficients in blocks are smaller than 0.5*σ*. Theoretically, for Gaussian distribution, this probability does not exceed 0.38. Gaussian distribution takes place for DCT coefficients of AWGN. Thus, the mean *P*_{0.5σ} approaches to 0.38 only if a considered image is “very homogeneous” and noise is intensive. This is postulated in further studies.

The obtained results for multiparameter fitting are presented in Table 4. The abbreviations are the same as in Table 2. The first observation is that even for one parameter (mean of local probabilities), the values *R*^{2} are sufficiently better than the corresponding values for *P*_{2σ}. Again the results for the BM3D filter are slightly worse than for the DCT-based filter and the results of predicting *IPHVSM* are worse than for predicting *IPSNR.* Again the use of only two input parameters, mean and variance of local estimates, seems to be a good practical choice. Thus, the best parameters of the function (21) are presented for this case in Table 5. Besides, we give an example of scatterplot fitting by 2D surface (function) for two-parameter case of using mean and variance of local estimates of the considered probability for predicting *IPHVSM* (see Fig. 4).

Filter | Metric | Statistical Parameters | R2 |

DCT filter | IPSNR | M | 0.986 |

M, Var | 0.989 | ||

M, S, K | 0.989 | ||

M, Med, S, K | 0.989 | ||

M, Var, Med, Mod, S | 0.99 | ||

IPHVSM | Mod | 0.844 | |

M, Var | 0.944 | ||

M, Var, Mod | 0.949 | ||

M, Var, Mod, S | 0.951 | ||

M, Var, Med, Mod, S | 0.952 | ||

BM3D | IPSNR | M | 0.975 |

M, Var | 0.977 | ||

M, Var, S | 0.978 | ||

M, Var, Med, S | 0.978 | ||

M, Var, Med, Mod, S | 0.978 | ||

IPHVSM | Mod | 0.852 | |

M, Var | 0.935 | ||

M, Var, Mod | 0.939 | ||

M, Var, Mod, S | 0.941 | ||

M, Var, Med, Mod, S | 0.941 |

Filter | Metric | a | b1 | b2 |

DCT filter | IPSNR | 0.168 | 10.8 | 19.28 |

IPHVSM | 0.01 | 15.66 | 144.3 | |

BM3D | IPSNR | 0.148 | 11.33 | 17.7 |

IPHVSM | 0.004 | 18.25 | 161.7 |

### 3.2. Analysis for signal-dependent and spatially correlated types of noise

Let us define the models of signal-dependent noise used. According to a first model [7], [11], the expression (1) transforms to

where

As mentioned in Section 2 (expression no. 15), the local threshold is set as *P*_{2σ} is obtained as

where

Some of the results of studies in our papers [54], [55] are presented next. One aspect that was specially addressed in these studies was to check the influence of an image set used in forming a scatterplot. In fact, two scatterplots have been formed separately: for the set of standard images used in optical image processing as Baboon, Barbara, Lena, etc., and for the set of images called “Remote Sensing” as Frisco, Diego, etc. The reason for such study was the following fact. Some people from RS community are categorically against using standard gray-scale test images in their studies although there are no commonly accepted sets of test RS images.

The methodology of obtaining scatterplot was modified a little. For the noise expression model (22), three different cases were modeled: prevailing influence of SI noise, dominant influence of SD noise, and comparable contribution of both components. As a result, a wide range of mean *P*_{2σ} has been provided. Scatterplot points that belong to different image sets are indicated by different signs (and different colors). There are also two fitted curves. We believe there is no essential difference between the scatterplots and fitted curves. Thus, it can be concluded that the prediction is quite universal and suitable for conventional gray-scale optical images and component-wise (single-channel) RS images. Moreover, it has been shown in a study [55] that prediction is valid for single-look SAR images corrupted by fully developed spatially uncorrelated speckle. It is also possible to compare the results in Fig. 5 with the data in Fig. 3(b). They are very similar. Fig. 4 shows that *IPSNR* is approximately 1 dB or less for *P*_{2σ} approximately 0.5 and then denoising is practically useless. Meanwhile, if *IPSNR* is approximately 4 dB for *P*_{2σ} approximately 0.8, then the use of filtering is expedient. The parameter *R*^{2} for both fitting curves in Fig. 5 is approximately 0.96, that is, the prediction is approximately as good as for AWGN case. Again, the results for *P*_{2σ} are better than for *P*_{2.7σ}; fitting for *IPSNR* is more accurate than for *IPHVSM*. Improved fitting by means of using multiple input parameters has not been investigated yet.

Two examples of image processing are presented here. Fig. 6(a) represents the noisy image Frisco, where noise parameters are σ_{0}^{2}=100; _{2σ}=0.92 is approximately 9.5 dB (see the blue fitted curve in Fig. 5), that is, there is good agreement of attained and predicted values. Prediction shows that it is worth applying denoising in this case.

For a real-life data, it is impossible to determine true values of the considered metrics characterizing filtering efficiency. However, it is possible to analyze the predicted values and denoising results visually. For fragments of sub-band images of hyperspectral sensor, Hyperion, such analysis was done. For example, noise parameters of the expression model (22) have been blindly estimated [11]. The noisy image for the 13th sub-band of the set EO1H1800252002116110KZ is depicted in Fig. 7(a). Noise is clearly seen. The prediction of *IPSNR* is approximately 8.5 dB and *IPHVSM* is approximately 5.7 dB. Thus, it is expedient to perform denoising. The denoised image is presented in Fig. 7(b). As can be seen, its quality has very much improved due to filtering.

The sub-bands 13...22 are considered for two sets of Hyperion data. The values *IPSNR* are always larger than *IPHVSM*. This means that it is harder to provide an improvement of image visual quality than to gain improvement according to standard metrics (MSE, PSNR). For the sub-bands with indices *k* = 13...16, *IPSNR* is always larger than 1.6 dB and *IPHVSM* exceeds 0.6 dB, that is, filtering is desirable. For other sub-bands, as the predicted improvements are small, it is doubtful whether it is worth carrying out filtering. Visual inspection of images in sub-bands with *k* = 17...22 has shown that noise is either hardly noticeable or practically invisible. Positive effect of its removal is partly or fully compensated by edge/detail/texture smearing performed by any filter, even the most sophisticated one [56]. The texture filtering is always problematic and the prediction approach is able to reliably predict this [56].

Considering certain benefits achieved due to using *IPSNR* attains very large values (approximately 10 dB and more).

Additional studies concentrated on the multi-look SAR images that were corrupted by pure multiplicative noise [57]. Analysis has been done for speckle variance *L* denotes the number of looks. Scatterplot points are presented in Fig. 9 for different number of looks. An obvious tendency is that mean *P*_{0.5σ} becomes larger and *IPSNR* increases for smaller number of looks. Other conclusions that can be drawn from analysis in a study in [57] are the following. Prediction is possible for filtering techniques with and without VST, where the prediction quality is better in the latter case. Prediction using different types of functions (polynomial, power, exponential) produce fitting of approximately equal accuracy. Meanwhile, accuracy of prediction is worth improving (RMSE is approximately 1 dB) since it is sufficiently worse than for the case of AWGN.

Understanding that, in practice, noise can be spatially correlated [33], the case of spatially correlated noise – additive in [45] and multiplicative in [57] – are also studied. A difficulty of dealing with spatially correlated noise is that there are numerous shapes (and parameter sets) of 2D auto-correlation function or spatial spectrum of such a noise. Thus, studying a particular case of spatially correlated noise gives only limited information on general dependences. Hence, two models of spatially correlated noise (called middle correlation and strong correlation) have been considered [45]. A peculiarity of prediction is that the local estimate of probability *P*_{2σ} is obtained according to expression (23), where, in the general case, *P*_{0.5σ} is used, the condition is

The scatterplots and fitted curves are presented in Fig. 10. The fitted curves are similar and they clearly show that there is no reason to filter images if *P*_{0.5σ} is smaller than 0.15. The difference in the scatterplots for *IPHVSM* and *IPSNR* is that the latter one is more compact and, thus, *IPSNR* can be predicted more accurately. An additional distinctive feature of the plot for *IPSNR* is that its maximal values are smaller than for AWGN case (data in Fig. 3(b)). The scatterplots for a strong correlation of the noise and the conclusions derived from them are similar.

We have also studied the case of spatially correlated speckle [57]. It has been shown that the prediction seems possible for a spatially correlated noise. However, more research is needed to understand how to select a parameter or several parameters to characterize spatial correlation and how it can be involved in prediction.

Finally, a preliminary research has been carried out for denoising color images corrupted by AWGN with equal variance values in channels [58]. There are two differences in prediction. First, all DCT coefficients in 3D block are subject to analysis for estimating the local probabilities. Second, the metric PSNR-HMA [59], which is a color extension of PSNR-HVS-M, and improvement of this metric due to filtering similar to expression (8) have been used. In addition, instead of BM3D, its color version called C-BM3D has been analyzed [46].

The scatterplots have been obtained and curves were fitted to them (see examples in Fig. 11). As mentioned earlier, filtering is useless for *P*_{0.5σ} < 0.15. However, this happens rarely (only for highly textured images when noise standard deviation is small). Another observation is the same as earlier – visual quality can be predicted worse than *IPSNR.* The prediction accuracy for C-BM3D is worse than for 3D DCT filter.

Taking into account our previous experience, the multiparameter input was analyzed with exponential function expressed in (20). Considerable improvement has been reached, especially for *IPHVSMA,* for the 3D DCT filter. For the C-BM3D filter, the positive effect is less. One has R^{2} equal to 0.8481 for one input parameter and 0.8555 for four parameters. Again, a reasonable practical solution is to use the mean and variance of local estimates of probability. One more important observation for color image filtering is that *P*_{0.5σ} for 3D filter is larger than for DCT filter applied to components of a processed color image. This again proves that 3D processing of color and multichannel images iiis are potentially more efficient compared to their component-wise denoising.

## 4. Prediction in lossy compression of noisy images

In this section, the compression of images corrupted by AWGN is considered. Lossy compression is carried out by the aforementioned coder AGU with *IPSNR* and *IPHVSM* and to decide whether OOP exists as well as to predict what CR is.

### 4.1. Prediction of OOP existence and metrics’ values in it

This section shortly describes how the scatterplots were obtained. As in the filtering case, a set of gray-scale test images of different content and complexity was used. AWGN of different intensity has been added and then the obtained images have been compressed by AGU. After this, the parameters (12) and (13) have been calculated as well as *P*_{2σ} for each compressed image. Clearly, all these actions are done off-line before applying the prediction approach in practice.

The obtained scatterplot is presented in Fig. 12. A specific feature of this scatterplot is that it has negative values and they seem to be approximately −3.5 dB for *P*_{2σ} approaching to zero. Therefore, not all fitting functions can be used. The study carried out by Zemliachenko et al. in [44] has shown that the polynomials of the fourth and fifth order usually allow approximating the dependence very well (with *R*^{2} almost equal to unity and RMSE approximately 0.25 for *IPSNR*). As can be seen from the analysis of the scatterplot in Fig. 12, there are quite many images and/or noise variances when OOP does not exist (*IPSNR* is negative). OOP exists with high probability if *P*_{2σ} exceeds 0.82. This can be used as a basis for predicting OOP existence.

The scatterplot for the metric *IPHVSM* is presented in Fig. 13. In some sense, behavior of the fitted polynomial is similar to the one in Fig. 12. There are many values about −4 dB showing that due to lossy compression the visual quality becomes worse. However, this mainly happens for small *P*_{2σ} that corresponds to high-complexity images and/or low level of the noise. The visual quality improves for *P*_{2σ} exceeding 0.9 and this takes place for low-complexity images and rather intensive noise.

Although prediction has been studied by simulations only for images corrupted by AWGN, it can also be applied to images corrupted by a signal-dependent spatially uncorrelated noise under condition that a proper VST is applied to them before compressing. Such VST (a generalized Anscombe transform in this case) provides approximately constant noise variance that usually equals to unity. Thus, QS = 4 is used. This approach has been used for Hyperion data and the results are presented in Fig. 14. There are two groups of sub-bands that are usually not analyzed in Hyperion data since they are too noisy. Thus, the prediction values are not given for all sub-bands. Analysis of the presented values shows that there are only a few sub-bands where it is worth expecting OOP. For most other sub-bands, *IPSNR* is about −3 dB and the ways of dealing with them are considered in a study [44]. One proposition is to set less QS but this leads to smaller CR.

Fig. 15 shows the original and the decompressed images in 110-th sub-band, where decrease of visual quality according to quantitative criteria is predicted. Noise is not seen in the original image and the compression practically does not influence the image quality (in our opinion, both images look the same).

A study [44] also presents data for three other DCT-based coders, where two of them are specially suited for providing better visual quality. It is demonstrated that the coder adaptive DCT (ADCT), which exploits the optimized partition schemes [60], provides certain improvements compared to AGU. Meanwhile, DCT coders oriented on improving the visual quality being applied to noisy images do not offer substantial benefits and, moreover, are even less efficient in many practical situations.

### 4.2. Prediction of compression ratio in OOP

The methodology of predicting CR in OOP is the same as that for filtering. It is based on the scatterplot obtaining and curve fitting. The only difference is that the vertical axis relates to CR, while the horizontal axis, as earlier, corresponds to mean probability. Two mean probabilities *P*_{2σ} and *P*_{2.7σ} have been considered where the latter occurred to be worse again. Therefore, the obtained results for the mean probability *P*_{2σ} only are presented below.

Two lossy compression methods, namely, the coders AGU and ADCT, have been studied. Their scatterplots are presented in Fig. 16. Contrary to other cases considered above, fitting is performed using a sum of two weighted exponential functions. As can be seen, fitting in both cases is very good with *R*^{2} exceeding 0.99. Slightly larger values of CR are provided by the more sophisticated coder ADCT [60]. Very large (over 20) values of CR are provided for *P*_{2σ} > 0.93, that is, for simple structure images corrupted by intensive noise.

We did not have real-life multichannel images corrupted by AWGN. But the hyperspectral data for the sensors Hyperion and airborne visible/infrared imaging spectrometer (AVIRIS) were available. Noise in them is signal dependent [14] with prevailing SD component for the model (22). The parameters of this noise were estimated in an automatic manner [11] and, thus, it became possible to apply VST (a generalized Anscombe transform with properly adjusted parameters) with converting noise into pure additive with unity variance.

Lossy compression in OOP neighborhood has been applied after VST. After decompression, inverse transform has to be applied, respectively. The obtained and predicted values of CR for Hyperion data are depicted in Fig. 17(a). As can be seen, the curves are in good agreement. There are some channels where predicted CRs are slightly larger than attained ones. This is explained by the imperfectness of VST and blind estimation of noise parameters for channels with high signal-to-noise ratio. The largest CRs take place for sub-bands with low SNR (these are the sub-bands with indices 13–20, 125–130, and 175–180).

The results for the AVIRIS test image Lunar Lake are given in Fig. 17(b). Here, the agreement between the predicted and the attained values is even better than for the Hyperion data. Again, the largest CR is observed for sub-bands with low SNR. There are considerable differences in maximal and minimal values of CR. The main reason is the different SNR and different dynamic range in sub-band images. Certainly, CR also depends upon the image content.

## 5. Conclusions and future work

It is demonstrated that it is possible to predict the efficiency of image filtering as well as the parameters of lossy compression of a noisy image in OOP neighborhood. As opposed to the earlier known approaches that allow predicting potential efficiency of filtering, the present approach predicts practically a reachable performance and makes this very rapidly, by one or more orders faster than filtering or compression itself.

Certainly, a limited number of quality metrics, filtering, and compression techniques have been considered. However, it is important that a general methodology of prediction is proposed, and it is shown there are somewhat strict connections between simple input parameters (that can be easily and quickly calculated) and output parameters that are able to adequately characterize the efficiency of filtering or lossy compression techniques. In favor of this methodology, there are certain facts. First, there are many modern filters that have filtering efficiency of the same order as the DCT-based filter and BM3D. Thus, predicting denoising efficiency for the filters mentioned above, it is possible to approximately predict performance for other modern filters (although such prediction would be less accurate). Second, the same holds for lossy compression methods. For example, AGU and JPEG2000 provide similar performance characteristics. Then, by predicting compression parameters for AGU, they are, in fact, estimated for JPEG2000 as well.

Concerning the decision making, whether to perform filtering or not, strict recommendations have been given for probabilities *P*_{2σ} and *P*_{0.5σ}. Filtering can be expedient if *P*_{2σ} exceeds 0.5 or *P*_{0.5σ} exceeds 0.15. Similarly, OOP is quite possible if *P*_{2σ} is approximately 0.85 or larger. A very important fact is that these rules for filtering are valid for different types of noise (pure additive and signal-dependent, additive white Gaussian and spatially correlated). This generalization can be considered as one of the main contributions of this chapter. Meanwhile, the case of spatially correlated noise requires more attention in future. In prediction of filtering efficiency, general prediction approximations for spatially correlated noise with *a priori* known or pre-estimated properties (e.g., 2D spectrum) have not been obtained yet. It can only be expected that the scatterplots for spatially correlated noise with other (not analyzed yet) shapes and parameters of spatial power spectrum behave similarly. The studies for lossy compression of images corrupted by spatially correlated noise are yet to be started. This opens a very wide field for future research.

The results of this research show that although sometimes the prediction of performance characteristics based on one input parameter is appropriately accurate, there are several means to improve the prediction accuracy. One way that deals with multiparameter input has been already used for particular cases. The use of mean *P*_{0.5σ} has shown itself a good solution, although it has not yet been tried for all possible applications. In particular, mean *P*_{0.5σ} has not been tested for lossy compression. It is hoped that performance can be improved due to this reason. Neural networks or other approximators of multidimensional functions (surfaces) can be useful.

There are also other possible directions for future research. 3D filtering warrants a more thorough study, at least, for the case of more than three channels. The same relates to 3D lossy compression performance, which has not been tried to predict yet. Compression parameters for QS other than the one recommended for OOP is also of sufficient interest in DCT-based lossy compression. Influence of errors in *a priori* information on noise parameters or their blind estimates on prediction accuracy has to be studied as well.