Goodness of fit (R2) of the obtained approximations
In processing of multichannel remote sensing data, there is a need in automation of basic operations as filtering and compression. Automation presumes undertaking a decision on expedience of image filtering. Automation also deals with obtaining of information based on which certain decisions can be undertaken or parameters of processing algorithms can be chosen. For the considered operations of denoising and lossy compression, it is shown that their basic performance characteristics can be quite easily predicted based on easily calculated local statistics in discrete cosine transform (DCT) domain. The described methodology of prediction is shown to be general and applicable to different types of noise under condition that its basic characteristics are known in advance or pre-estimated accurately.
- Multichannel remote sensing data
- automatic processing
- lossy compression
- performance prediction
Remote-sensing (RS) data are widely used for numerous applications , . Primary RS images acquired onboard of airborne or spaceborne carriers and intended for Earth surface monitoring are usually not ready for direct use and, thus, are subject to a certain preprocessing. This preprocessing can be carried out in several stages and includes the following operations: geo-referencing and calibration, blind estimation of noise/distortion characteristics, pre-filtering, lossless or lossy compression, , , etc. These operations can be distributed between onboard and on-land computer means (processors) in different ways depending upon many factors [3-5].
Regardless of the distribution of functions, the operations onboard are usually performed in a fully automatic manner (although there can be some changes in algorithm parameters by command passed from Earth). In turn, the operations carried out on land can be, in general, performed in an interactive manner and labor of highly qualified experts is exploited for this purpose. However, a certain degree of automation of on-land data processing is required as well. The need in processing automation is especially high if one deals with multichannel (e.g., hyperspectral) RS data , where the number of channels (components, sub-bands) can reach hundreds. Such RS images have become popular and widespread (available) currently due to their (potential) ability to provide rich information for various applications , .
How to manage large volumes of acquired data with maximal or appropriate efficiency (here, different criteria of efficiency can be used)?
Is it possible to skip some operations of data processing if their efficiency is not high and, consequently, if it is not worth performing them?
The latter question can be mainly addressed as mentioned below. It is strictly connected with other questions as follows:
Is it possible to predict the performance of some standard operations of RS data (image) processing?
What is the accuracy of such a prediction and is this accuracy high enough to undertake a decision to skip carrying out an operation or to set a certain value of some parameter used in the image-processing chain ?
This chapter will focus on two typical operations of multichannel RS data processing, namely, filtering and lossy compression. While considering them, the fact that the acquired images are noisy is taken into account. One can argue that noise is not seen in many RS images (or components of these images). This is true, and noise cannot be observed in approximately 80% of the visualized sub-band images of hyperspectral data. This is explained by the peculiarities of human vision, which does not see noise if peak signal-to-noise ratio (PSNR) in a given single-channel (component) image exceeds 32–38 dB. However, recent studies , [10-12] have demonstrated that noise is present in all sub-band images and this is due to the principle of operation of hyperspectral imagers.
Moreover, it has been shown in ,  that noise is (can be) of quite a complex nature and the noise acquired in multichannel RS images has specific properties. First, it is signal-dependent , , . Second, it is of essentially a different intensity (see Abramov et al., 2015 in ). More precisely, the wide variation of dynamic range and noise intensity in sub-band images jointly leads to wide limits of signal-to-noise ratio (SNR) in components of multichannel images. This has led to the use of the term “junk bands”  and different strategies of coping with noisy channels in multichannel data. Some researchers prefer to use these sub-bands in further processing while others propose to remove them; it is also discussed whether they can be filtered or not . It has been shown that if filtering of these junk bands is efficient, this can improve the classification of hyperspectral data . However, the aforementioned questions concern the efficiency of image preprocessing and its prediction.
The questions raised can be partly answered with the results obtained in recent research. The objective is to show that important performance parameters of image denoising and/or lossy compression can be quickly and quite accurately predicted using simple input parameter(s) and dependences obtained in advance. The obtained results are divided into two parts. The first part deals with the prediction of filtering efficiency. This research has started in 2013  and has its history in a study conducted in . The second part relates to the compression of noisy images , . In fact, the results obtained for predicting the parameters of lossy compression can be treated as based on the same principle as that for image filtering and for further research.
Before taking the image performance criteria and preprocessing techniques into consideration, it is important to note the following: first, there are two hypotheses. It is supposed that noise type is known or determined in advance. It is also assumed that its parameters are either known or accurately pre-estimated. It is to be noted that, currently, there are quite a few efficient methods for estimating the parameters of pure additive noise , [21-25], speckle noise , and different types of signal-dependent noise [10-12], , . The noise parameters are taken into account by the most modern filtering techniques that belong to the families of orthogonal-transform-based filters [29-33] and nonlocal filters, for example, block-matching and three-dimensional filtering (BM3D) . The same relates to modern methods of lossy compression of noisy images , .
Second, we restrict ourselves to consider the image- filtering and compression techniques based on discrete cosine transform (DCT). This is explained using several reasons. DCT is a powerful orthogonal transform widely exploited in image processing. Filters and compression techniques based on DCT are currently among the best . They can be quite easily adapted to the signal-dependent noise directly ,  or equipped with proper variance-stabilizing transformations (VST) , , . This restriction does not mean that the approach to prediction cannot be applied to other filtering and lossy compression techniques. This approach should be applicable (with certain modifications) but is yet to be thoroughly checked.
Third, in the analysis of the prediction approach, traditional quality metrics are employed such as mean square error (MSE) and peak signal-to-noise ratio (PSNR), as well as some visual quality metrics such as PSNR human visual system masking metric (PSNR-HVS-M) . Behavior and properties of traditional metrics are understood well by those dealing with image processing. Although PSNR-HVS-M is less popular, this is one of the best metrics that takes into account the peculiarities of human visual system (HVS) and that can be calculated for either one component of a multichannel image or a group of components of a multichannel image. It is expressed in dB, and it is usually either slightly smaller than PSNR (for annoying types of distortions like spatially correlated noise) or larger than PSNR (if distortions are masked by texture). This is important since we assume that the processing of multichannel images is carried out either component-wise or in groups of channel images, where a group includes the entire image in marginal case.
Fourth, other criteria of image-processing efficiency, such as classification accuracy, object detectability, etc., are important for the preprocessed RS data. We are unable to predict them, but recent research shows  that these criteria are connected with the traditional criteria of image processing. Thus, it is expected that if good values of conventional and HVS metrics are provided due to preprocessing, appropriate classification accuracy and other criteria will be attained.
2. The considered image-performance criteria and preprocessing techniques
This chapter considers the following model of an observed multichannel image:
where is -th sample of noisy (original) k-th component of a multichannel image, denotes the ij-th value of the noise in k-th component statistic, which is, in general, supposed to be dependent on the true image value in this voxel (3D pixel), I and J define the image size, and K denotes the number of channels. It is also assumed that the images and are strongly correlated and they have similar dynamic ranges and determined as , where and are maximal and minimal values in k-th channel image, respectively. It is also possible to assume that noise is of the same type and neighbor channels have quite close values of input MSEs (equal to noise variance if the noise is pure additive) as follows:
and input PSNR
The same assumptions are valid for input determined similarly to expression (3) with the difference that is replaced by , which is a special kind of weighted MSE calculated in spectral (DCT) domain considering the masking effects . The aforementioned assumptions are valid for color red, green, blue (RGB) images , multispectral and hyperspectral RS images , , dual polarization, and multifrequency radar images . These properties can be effectively exploited in multichannel image preprocessing .
After applying a considered filter, one obtains a filtered image that is supposed to be closer to according to a chosen metric (a quantitative criterion). These output metrics are calculated as
Output is determined similarly to (5).
Then, one has to characterize the efficiency of filtering. One way to do this is to use
Small values of the ratio in expression (6) and large values of expressions (7) and (8), both expressed in dB, are evidence in favor of efficient filtering.
Similarly, after lossy compression, one obtains . It is usually supposed that for a larger compression ratio (CR), the quality of compressed image is worse. This is true for lossy compression of noise-free images where more distortions are introduced for a larger CR. However, in lossy compression of noisy images, the situation is specific . Lossy compression results in certain filtering (noise removal) effect under certain conditions. Due to this filtering effect, it is possible that
occurs to be less than . Then, the compression method parameter (quantization step (QS), scaling factor (SF) or bits per pixel (bpp) depending upon a coder used) for which falls into global minimum is called optimal operation point (OOP). This parameter is important and needs additional explanation. Fig. 1(a) presents the dependences of
on QS for the lossy DCT-based coder AGU  for two known gray-scale test images Airfield (Fig. 1(b)) and Frisco (Fig. 1(c)) corrupted by additive white Gaussian noise (AWGN) with variance . The test image Frisco has a simpler structure – it contains more homogeneous image regions that correspond to sea surface. Due to this, the filtering effect of lossy compression is larger and the dependence has an obvious global maximum (i.e., the OOP), according to , since maximum of corresponds to minimum of . Formally, there is no OOP for the other test image Airfield, but the dependence has local maximum. Both aforementioned maxima take place for , which is a recommended choice for the coder AGU .
The lossy compression in the neighborhood of OOP has obvious advantages. Compressed images have high quality, and, at the same time, they have CR considerably larger than for lossless compression , . Because of these benefits, the lossy compression of noisy images in the OOP neighborhood is considered. If OOP does not exist, nevertheless, the recommended setting can be considered. If noise is signal dependent and VST is not used, the setting is where . Then, in OOP, one has parameters and it is possible to determine for them the following metrics (parameters characterizing compression performance):
where and positive or mean that OOP exists according to the corresponding metric.
Certainly, there are also other valuable performance criteria. For image pre-filtering, it is important to know the computational efficiency of the denoising method and how easily it can be implemented, especially onboard. For image lossy compression, it is important to know CR provided and how easily it can be attained. To partly address these issues, the filtering and compression techniques are briefly described.
DCT-based filtering ,  is performed in a block-wise manner, where 8 × 8 pixels are a typically set block size. Filtering can be performed with nonoverlapping, partly overlapping, and fully overlapping blocks. In the latter case, filtering efficiency (expressed in improvement of PSNR () or improvement of PSNR-HVS-M ()) is the highest but more computations are needed. Nevertheless, the filter is very fast since it is possible to use fast algorithms and to parallelize computations.
There are three main steps in processing: direct 2D DCT in each block; thresholding of DCT coefficients; inverse DCT applied to thresholded DCT coefficients; then, the filtered data from overlapping blocks are aggregated. Within this structure, different variants of thresholding are possible but employing hard thresholding is preferred, where DCT coefficient values remain unchanged if their amplitudes exceed a threshold or are assigned zero values otherwise. If one deals with AWGN, the threshold is set fixed as
For spatially uncorrelated signal-dependent noise with a priori known or accurately pre-estimated dependence of local standard deviation on local (block) mean , one has to set a locally adaptive threshold:
Finally, for spatially correlated and signal-dependent noise with a priori known or properly pre-estimated normalized DCT spectrum , where qs are indices of DCT coefficients in blocks , the thresholds are locally adaptive and frequency dependent:
In expressions (14–16), β is the parameter. Depending upon the image complexity and noise intensity, its optimal value can vary a little , but the recommended choices are β = 2.6 to provide good filtering according to and β = 2.3 to ensure quasi-optimal denoising according to . In further studies, β =2.6 will be used. A 3D version of the DCT–based filter  performs similarly. The difference is that the blocks are 3D, of size 8 × 8 × Kgr, where Kgr ≤ K denotes a channel group size.
Conventional BM3D  is a more sophisticated denoising method. It presumes search for similar patches (blocks), with their joint processing in a 3D manner using DCT and Haar transform, and post-processing stage. This filtering principle, originally designed to cope with AWGN in gray-scale images, has been later adapted to the cases of signal-dependent noise after a proper VST , spatially correlated noise  and color (three-channel) images corrupted by AWGN . The BM3D and its modifications provide a slightly better performance than the corresponding modifications of the conventional DCT-based denoising by the expense of considerably more extensive computations.
The lossy compression technique called AGU  is based on DCT in 32 × 32 pixel blocks, a more efficient (compared to JPEG) coding of quantized DCT coefficients and post-processing to remove the blocking artifacts after decompression. This coder is quite simple but slightly more efficient than JPEG 2000) or set partitioning in hierarchical trees (SPIHT) in rate/distortion sense. This coder has 3D version  and CR for both 2D and 3D versions is controlled (changed) by QS.
3. Prediction of filtering efficiency
The main idea of filtering efficiency prediction is the following . Suppose there is some input parameter(s) able to jointly characterize image complexity and noise intensity and also there is some output parameter(s) capable of adequately describing the image denoising efficiency. Assume that there is a rather strict connection between these input and output parameters that allows predicting output value(s) having input value(s).
An additional assumption (and requirement to prediction) is that input parameter(s) have to be calculated easily and quickly enough, faster than denoising itself (otherwise, the prediction becomes useless). If all these assumptions are valid, it becomes possible to determine a predicted output value before starting image filtering and to decide whether it is worth filtering a given image (component) or not. Another decision can relate to setting parameter(s) of a used filter. For example, if a processed image seems to be textural (having high complexity), parameter(s) of a used filter can be adjusted to provide better edge/detail/texture preservation. For example, the parameter β for the DCT-based filter can be set equal to 2.3.
Keeping these general principles in mind, we have to address several tasks:
What is a good (in the best case, optimal) input parameter (or a set of parameters)?
What is a good (proper, acceptable) output parameter (or a set of parameters) that allows to characterize the filtering efficiency adequately and to undertake a decision (on using filtering or not, on setting a filter parameter, etc.)?
How to get dependence between output and input parameters and how accurate it is?
These questions are partly answered below and the outcomes obtained in design and performance analysis of prediction techniques are described. We believe that a partial answer to the second question is the following. The ratio in expression (6) as well as the parameters and (especially if analyzed jointly) are able to provide the initial insights (characterization) of filtering efficiency. Note that expressions (6) and (7) are mutually dependent metrics and . Thus, they can be used as output parameter(s) at the current stage of research.
3.1. Input and output parameter sets testing and comparison
Based on the outcomes of the study , Abramov et al. in 2013  observed that there is dependence between efficiency of filtering expressed by (6) and simple statistics of DCT-coefficients determined in 8 × 8 blocks. Two probability parameters have been considered. The first one denoted as P2σ is the mean probability that the amplitudes of DCT coefficients are not larger than 2σ, where σ denotes the standard deviation of additive white Gaussian noise. This parameter originated from analogies with known sigma filter . The second parameter denoted as P2.7σ is the mean probability that the amplitudes of DCT coefficients are larger than 2.7σ. Here, there is an obvious analogy with hard thresholding in DCT-based filter, where the recommended β = 2.7. At the starting point, Abramov et al., 2013 had no idea on the optimality of input parameters. The objective was just to check whether the prediction is possible, in principle, using a restricted set of test gray-scale images (18) and standard deviations of AWGN (5, 10, 15). The data have been presented as scatterplots, where the Y-axis reflects the ratio in expression (6) and X-axis corresponds to a considered statistical (input) parameter (either P2σ or P2.7σ). These scatterplots are represented in Fig. 2. Obviously, the scatterplots’ points are clustered well along the fitted lines (for easy fitting, second-order polynomials were used). Interestingly, small P2σ and large P2.7σ correspond to complex structure images corrupted by low-intensity noise. In this case, efficiency of image filtering is low (the ratio in expression (6) is close to unity, see Fig. 2). Note that this is in agreement with the theory of filtering , . It shows that efficient filtering of textural images is problematic for any existing filters including the most sophisticated nonlocal ones .
The results of the study conducted in  have also shown the following. First, quality of fitting has to be characterized quantitatively. For this purpose, the approach  works well. It provides the parameter (coefficient of determination) R2 that tends to unity for perfectly fitted curves and root mean square error (RMSE) of fitting that should be as small as possible. These parameters are strictly connected with prediction accuracy. For perfectly determined P2σ or P2.7σ, RMSE of fitting directly describes the accuracy of prediction.
The conclusions drawn in  can be recalled here. First, the prediction of filtering efficiency for BM3D is less accurate than for the conventional DCT-based filter. This conclusion has been confirmed in later studies. This is associated with the use of two denoising mechanisms (DCT denoising and similar block search with their joint processing), where the latter mechanism has no connection to DCT statistics. Second, although the prediction accuracy for both P2σ and P2.7σ is quite good (R2 > 0.9 and RMSE < 1.0), the probability P2σ provides sufficiently better prediction (quality of fitting) than P2.7σ. This shows that the use of other input parameters is possible. Third, different types of fitting functions (polynomials, power and exponential functions) were able to provide approximately the same quality of fitting (for example, the fitted curve in Fig. 2(a) is ; for the BM3D filter, the obtained function of is ). Thus, certain reserves in improving the fitting accuracy “are hidden” in choosing an approximating curve and its parameters. Fourth, it has also been shown that the probabilities P2σ and P2.7σ can be determined with appropriate accuracy from analysis of not all possible overlapping blocks but from partly or even nonoverlapping blocks if their total number is not less than 300...500. This additionally accelerates the prediction compared even to conventional DCT-based filtering.
There are also observations understood later (in two recent years). First, there should be some restrictions imposed on the approximating function. For example, it is clear that the ratio in expression (6) cannot be negative. It is also clear that an approximating (fitting) function should be determined for all possible values of its arguments. Since the probabilities serve as arguments, they can vary from zero to unity. Meanwhile, arguments in both scatterplots in Fig. 2 vary in narrower limits. Besides, it could be good for curve fitting to have point arguments with approximately uniform density.
These requirements have been satisfied by using considerably more test images (including highly textural ones) and a wider set of noise standard deviations (including quite small ones). This has allowed obtaining scatterplot points for small P2σ and large P2.7σ.
Examples of the obtained scatterplots and fitted curves for the DCT-based denoising are shown in Fig. 3. As it is seen, fitting is rather good and coefficient of determination is approximately 0.95 (see the details below). We believe these are already good results that allow practical recommendations. For example, it is clearly seen that there is no reason to carry out filtering if P2σ is smaller than 0.5 since the benefit obtained due to denoising is negligible (approximately 1 dB or less). Prediction itself is carried out as follows. Having the fitted curves obtained in advance as described above, it is needed to calculate P2σ or P2,7σ for a given image before filtering and to substitute it as argument into the approximating function to calculate a desired metric that characterizes the predicted denoising efficiency.
Expressions for the obtained approximations for the DCT filter are as follows (we give only the functions of , more details can be found in ):
The values of R2 are presented in Table 1. The analysis confirms that it is better to use P2σ than P2.7σ. Prediction of κ is slightly more accurate than the prediction of IPSNR. However, the prediction of IPHVSM is worth improving.
It has been discovered that not only the mean of local (block) estimates of probability P2σ is connected with predicted metrics , but the other statistical parameters of the distribution of local estimates can also be exploited to improve prediction. The general framework to obtain an estimate of a predicted metric by multiparameter fitting is described by the following formula:
where a and bi are approximation factors, Oi. i = 1,...,n, is some parameter of distribution, n defines the number of such parameters. As Oi, it is possible to use the distribution mean, median, mode, variance, skewness, and kurtosis. The factors a and bi, i = 1,...,n have to be obtained in advance by multidimensional (n-dimensional) regression.
The results of using multidimensional regression are presented in Table 2. The abbreviations used are the following: M – mean; Var – variance; Med – median, Mod – mode; K – kurtosis; S – skewness; all calculated for a set of local estimates of probability P2σ. The results are given for both considered filters for the metrics IPSNR and IPHVSM. Only the best sets for n from 1 to 5 are presented since the joint use of all considered parameters is less efficient than five input parameters employed together.
|M, Var, Mod||0.974|
|M, Var, Mod, K||0.976|
|M, Var, Med, Mod, S||0.977|
|M, Var, Med||0.926|
|M, Var, Med, S||0.927|
|M, Var, Med, Mod, S||0.928|
|M, Var, Mod||0.959|
|M, Var, Mod, S||0.961|
|M, Var, Med, Mod, S||0.961|
|M, Var, S||0.905|
|M, Var, S, K||0.909|
|M, Var, Med, S, K||0.917|
The conclusions are the following. The use of more input parameters leads to larger (better) R2 for both filters and both metrics. The benefit of using several input parameters instead of one is quite small for IPSNR, where R2 for one-parameter prediction is already quite high. Meanwhile, for the visual quality metric IPHVSM, the improvement is quite large. Interestingly, the use of median of local estimates instead of the mean considerably improves prediction (compare the data in Tables 2 and 1) for IPHVSM for the DCT-based filter and P2σ.
More input parameters provide better prediction. At the same time, more time is needed for calculation of input parameters (although their calculation is not difficult). Then, a compromise solution could be the use of the dependence of the type
where denotes the local estimates of probabilities obtained in blocks. The approximation coefficients for all cases are presented in Table 3.
The expression (20) is not the only way to combine several input parameters into a joint output. Neural networks (NN) are known to perform this task rather well and to be good approximators . This property has been used by us in  to make the neural network predict the considered metrics based on multiple input parameters. The obtained results are practically the same as in Table 3. Therefore, there is no need to use a more complex NN approximator instead of expression (20).
A more reasonable solution is to look for better input parameters. Such a study has been conducted in . It has been shown that the probability P0.5σ is more informative than P2σ, that P0.5σ is the mean probability where the magnitudes of DCT coefficients in blocks are smaller than 0.5σ. Theoretically, for Gaussian distribution, this probability does not exceed 0.38. Gaussian distribution takes place for DCT coefficients of AWGN. Thus, the mean P0.5σ approaches to 0.38 only if a considered image is “very homogeneous” and noise is intensive. This is postulated in further studies.
The obtained results for multiparameter fitting are presented in Table 4. The abbreviations are the same as in Table 2. The first observation is that even for one parameter (mean of local probabilities), the values R2 are sufficiently better than the corresponding values for P2σ. Again the results for the BM3D filter are slightly worse than for the DCT-based filter and the results of predicting IPHVSM are worse than for predicting IPSNR. Again the use of only two input parameters, mean and variance of local estimates, seems to be a good practical choice. Thus, the best parameters of the function (21) are presented for this case in Table 5. Besides, we give an example of scatterplot fitting by 2D surface (function) for two-parameter case of using mean and variance of local estimates of the considered probability for predicting IPHVSM (see Fig. 4).
|M, S, K||0.989|
|M, Med, S, K||0.989|
|M, Var, Med, Mod, S||0.99|
|M, Var, Mod||0.949|
|M, Var, Mod, S||0.951|
|M, Var, Med, Mod, S||0.952|
|M, Var, S||0.978|
|M, Var, Med, S||0.978|
|M, Var, Med, Mod, S||0.978|
|M, Var, Mod||0.939|
|M, Var, Mod, S||0.941|
|M, Var, Med, Mod, S||0.941|
3.2. Analysis for signal-dependent and spatially correlated types of noise
where denote signal-independent (SI) and signal-dependent (SD) noise components. Both the noise components in expression (22) are assumed zero mean, spatially uncorrelated and Gaussian. Then, the model for the noise variance is , where is the SI noise variance and is the SD noise parameter (which is usually between zero and unity). A second model  presumes purely multiplicative noise with , where denotes unity mean random factor with variance that is within the limits from 0 to 1. It is supposed for both the models that the noise is spatially uncorrelated.
As mentioned in Section 2 (expression no. 15), the local threshold is set as for signal-dependent noise (expression no. 22) and as for pure multiplicative noise. In addition to modifying the filtering algorithm, we need to modify the algorithm of input parameter calculation. Then, the local probability estimate has to consider the local variation of noise standard deviation. For instance, the local estimate of probability P2σ is obtained as
where and 0 otherwise (is equal to or to depending upon a model used). DC component of DCT coefficients in blocks is not taken into account as it always exceeds the local threshold.
Some of the results of studies in our papers ,  are presented next. One aspect that was specially addressed in these studies was to check the influence of an image set used in forming a scatterplot. In fact, two scatterplots have been formed separately: for the set of standard images used in optical image processing as Baboon, Barbara, Lena, etc., and for the set of images called “Remote Sensing” as Frisco, Diego, etc. The reason for such study was the following fact. Some people from RS community are categorically against using standard gray-scale test images in their studies although there are no commonly accepted sets of test RS images.
The methodology of obtaining scatterplot was modified a little. For the noise expression model (22), three different cases were modeled: prevailing influence of SI noise, dominant influence of SD noise, and comparable contribution of both components. As a result, a wide range of mean P2σ has been provided. Scatterplot points that belong to different image sets are indicated by different signs (and different colors). There are also two fitted curves. We believe there is no essential difference between the scatterplots and fitted curves. Thus, it can be concluded that the prediction is quite universal and suitable for conventional gray-scale optical images and component-wise (single-channel) RS images. Moreover, it has been shown in a study  that prediction is valid for single-look SAR images corrupted by fully developed spatially uncorrelated speckle. It is also possible to compare the results in Fig. 5 with the data in Fig. 3(b). They are very similar. Fig. 4 shows that IPSNR is approximately 1 dB or less for P2σ approximately 0.5 and then denoising is practically useless. Meanwhile, if IPSNR is approximately 4 dB for P2σ approximately 0.8, then the use of filtering is expedient. The parameter R2 for both fitting curves in Fig. 5 is approximately 0.96, that is, the prediction is approximately as good as for AWGN case. Again, the results for P2σ are better than for P2.7σ; fitting for IPSNR is more accurate than for IPHVSM. Improved fitting by means of using multiple input parameters has not been investigated yet.
Two examples of image processing are presented here. Fig. 6(a) represents the noisy image Frisco, where noise parameters are σ02=100; , and . The output image for the DCT-based filter is presented in Fig. 6(b). The effect of denoising is obvious. Actual provided improvement of PSNR is equal to 9.77 dB. The predicted value for mean P2σ=0.92 is approximately 9.5 dB (see the blue fitted curve in Fig. 5), that is, there is good agreement of attained and predicted values. Prediction shows that it is worth applying denoising in this case.
For a real-life data, it is impossible to determine true values of the considered metrics characterizing filtering efficiency. However, it is possible to analyze the predicted values and denoising results visually. For fragments of sub-band images of hyperspectral sensor, Hyperion, such analysis was done. For example, noise parameters of the expression model (22) have been blindly estimated . The noisy image for the 13th sub-band of the set EO1H1800252002116110KZ is depicted in Fig. 7(a). Noise is clearly seen. The prediction of IPSNR is approximately 8.5 dB and IPHVSM is approximately 5.7 dB. Thus, it is expedient to perform denoising. The denoised image is presented in Fig. 7(b). As can be seen, its quality has very much improved due to filtering.
The sub-bands 13...22 are considered for two sets of Hyperion data. The values IPSNR are always larger than IPHVSM. This means that it is harder to provide an improvement of image visual quality than to gain improvement according to standard metrics (MSE, PSNR). For the sub-bands with indices k = 13...16, IPSNR is always larger than 1.6 dB and IPHVSM exceeds 0.6 dB, that is, filtering is desirable. For other sub-bands, as the predicted improvements are small, it is doubtful whether it is worth carrying out filtering. Visual inspection of images in sub-bands with k = 17...22 has shown that noise is either hardly noticeable or practically invisible. Positive effect of its removal is partly or fully compensated by edge/detail/texture smearing performed by any filter, even the most sophisticated one . The texture filtering is always problematic and the prediction approach is able to reliably predict this .
Considering certain benefits achieved due to using as input parameter, the analysis similar to the one presented in Fig. 5 has been performed. The results are presented in Fig. 8. The noise is signal-dependent and most scatterplot points correspond to the expression model (22). The curve is fitted employing all points (although they relate to optical and RS subsets). Obviously, fitting is very good and, according to quantitative criteria, it is better than for the parameter (Fig. 5). Four black points at the scatterplot in Fig. 8 correspond to one-look SAR images. They fit the curve well and have the arguments close to the maximal potential limit (0.38), where IPSNR attains very large values (approximately 10 dB and more).
Additional studies concentrated on the multi-look SAR images that were corrupted by pure multiplicative noise . Analysis has been done for speckle variance , where L denotes the number of looks. Scatterplot points are presented in Fig. 9 for different number of looks. An obvious tendency is that mean P0.5σ becomes larger and IPSNR increases for smaller number of looks. Other conclusions that can be drawn from analysis in a study in  are the following. Prediction is possible for filtering techniques with and without VST, where the prediction quality is better in the latter case. Prediction using different types of functions (polynomial, power, exponential) produce fitting of approximately equal accuracy. Meanwhile, accuracy of prediction is worth improving (RMSE is approximately 1 dB) since it is sufficiently worse than for the case of AWGN.
Understanding that, in practice, noise can be spatially correlated , the case of spatially correlated noise – additive in  and multiplicative in  – are also studied. A difficulty of dealing with spatially correlated noise is that there are numerous shapes (and parameter sets) of 2D auto-correlation function or spatial spectrum of such a noise. Thus, studying a particular case of spatially correlated noise gives only limited information on general dependences. Hence, two models of spatially correlated noise (called middle correlation and strong correlation) have been considered . A peculiarity of prediction is that the local estimate of probability P2σ is obtained according to expression (23), where, in the general case, and 0 otherwise (is the local standard deviation in a considered block; expressions for its derivation depending upon noise model are given above). If the probability P0.5σ is used, the condition is and 0 otherwise.
The scatterplots and fitted curves are presented in Fig. 10. The fitted curves are similar and they clearly show that there is no reason to filter images if P0.5σ is smaller than 0.15. The difference in the scatterplots for IPHVSM and IPSNR is that the latter one is more compact and, thus, IPSNR can be predicted more accurately. An additional distinctive feature of the plot for IPSNR is that its maximal values are smaller than for AWGN case (data in Fig. 3(b)). The scatterplots for a strong correlation of the noise and the conclusions derived from them are similar.
We have also studied the case of spatially correlated speckle . It has been shown that the prediction seems possible for a spatially correlated noise. However, more research is needed to understand how to select a parameter or several parameters to characterize spatial correlation and how it can be involved in prediction.
Finally, a preliminary research has been carried out for denoising color images corrupted by AWGN with equal variance values in channels . There are two differences in prediction. First, all DCT coefficients in 3D block are subject to analysis for estimating the local probabilities. Second, the metric PSNR-HMA , which is a color extension of PSNR-HVS-M, and improvement of this metric due to filtering similar to expression (8) have been used. In addition, instead of BM3D, its color version called C-BM3D has been analyzed .
The scatterplots have been obtained and curves were fitted to them (see examples in Fig. 11). As mentioned earlier, filtering is useless for P0.5σ < 0.15. However, this happens rarely (only for highly textured images when noise standard deviation is small). Another observation is the same as earlier – visual quality can be predicted worse than IPSNR. The prediction accuracy for C-BM3D is worse than for 3D DCT filter.
Taking into account our previous experience, the multiparameter input was analyzed with exponential function expressed in (20). Considerable improvement has been reached, especially for IPHVSMA, for the 3D DCT filter. For the C-BM3D filter, the positive effect is less. One has R2 equal to 0.8481 for one input parameter and 0.8555 for four parameters. Again, a reasonable practical solution is to use the mean and variance of local estimates of probability. One more important observation for color image filtering is that P0.5σ for 3D filter is larger than for DCT filter applied to components of a processed color image. This again proves that 3D processing of color and multichannel images iiis are potentially more efficient compared to their component-wise denoising.
4. Prediction in lossy compression of noisy images
In this section, the compression of images corrupted by AWGN is considered. Lossy compression is carried out by the aforementioned coder AGU with . In this case, OOP may exist or be absent. The task is to predict IPSNR and IPHVSM and to decide whether OOP exists as well as to predict what CR is.
4.1. Prediction of OOP existence and metrics’ values in it
This section shortly describes how the scatterplots were obtained. As in the filtering case, a set of gray-scale test images of different content and complexity was used. AWGN of different intensity has been added and then the obtained images have been compressed by AGU. After this, the parameters (12) and (13) have been calculated as well as P2σ for each compressed image. Clearly, all these actions are done off-line before applying the prediction approach in practice.
The obtained scatterplot is presented in Fig. 12. A specific feature of this scatterplot is that it has negative values and they seem to be approximately −3.5 dB for P2σ approaching to zero. Therefore, not all fitting functions can be used. The study carried out by Zemliachenko et al. in  has shown that the polynomials of the fourth and fifth order usually allow approximating the dependence very well (with R2 almost equal to unity and RMSE approximately 0.25 for IPSNR). As can be seen from the analysis of the scatterplot in Fig. 12, there are quite many images and/or noise variances when OOP does not exist (IPSNR is negative). OOP exists with high probability if P2σ exceeds 0.82. This can be used as a basis for predicting OOP existence.
The scatterplot for the metric IPHVSM is presented in Fig. 13. In some sense, behavior of the fitted polynomial is similar to the one in Fig. 12. There are many values about −4 dB showing that due to lossy compression the visual quality becomes worse. However, this mainly happens for small P2σ that corresponds to high-complexity images and/or low level of the noise. The visual quality improves for P2σ exceeding 0.9 and this takes place for low-complexity images and rather intensive noise.
Although prediction has been studied by simulations only for images corrupted by AWGN, it can also be applied to images corrupted by a signal-dependent spatially uncorrelated noise under condition that a proper VST is applied to them before compressing. Such VST (a generalized Anscombe transform in this case) provides approximately constant noise variance that usually equals to unity. Thus, QS = 4 is used. This approach has been used for Hyperion data and the results are presented in Fig. 14. There are two groups of sub-bands that are usually not analyzed in Hyperion data since they are too noisy. Thus, the prediction values are not given for all sub-bands. Analysis of the presented values shows that there are only a few sub-bands where it is worth expecting OOP. For most other sub-bands, IPSNR is about −3 dB and the ways of dealing with them are considered in a study . One proposition is to set less QS but this leads to smaller CR.
Fig. 15 shows the original and the decompressed images in 110-th sub-band, where decrease of visual quality according to quantitative criteria is predicted. Noise is not seen in the original image and the compression practically does not influence the image quality (in our opinion, both images look the same).
A study  also presents data for three other DCT-based coders, where two of them are specially suited for providing better visual quality. It is demonstrated that the coder adaptive DCT (ADCT), which exploits the optimized partition schemes , provides certain improvements compared to AGU. Meanwhile, DCT coders oriented on improving the visual quality being applied to noisy images do not offer substantial benefits and, moreover, are even less efficient in many practical situations.
4.2. Prediction of compression ratio in OOP
The methodology of predicting CR in OOP is the same as that for filtering. It is based on the scatterplot obtaining and curve fitting. The only difference is that the vertical axis relates to CR, while the horizontal axis, as earlier, corresponds to mean probability. Two mean probabilities P2σ and P2.7σ have been considered where the latter occurred to be worse again. Therefore, the obtained results for the mean probability P2σ only are presented below.
Two lossy compression methods, namely, the coders AGU and ADCT, have been studied. Their scatterplots are presented in Fig. 16. Contrary to other cases considered above, fitting is performed using a sum of two weighted exponential functions. As can be seen, fitting in both cases is very good with R2 exceeding 0.99. Slightly larger values of CR are provided by the more sophisticated coder ADCT . Very large (over 20) values of CR are provided for P2σ > 0.93, that is, for simple structure images corrupted by intensive noise.
We did not have real-life multichannel images corrupted by AWGN. But the hyperspectral data for the sensors Hyperion and airborne visible/infrared imaging spectrometer (AVIRIS) were available. Noise in them is signal dependent  with prevailing SD component for the model (22). The parameters of this noise were estimated in an automatic manner  and, thus, it became possible to apply VST (a generalized Anscombe transform with properly adjusted parameters) with converting noise into pure additive with unity variance.
Lossy compression in OOP neighborhood has been applied after VST. After decompression, inverse transform has to be applied, respectively. The obtained and predicted values of CR for Hyperion data are depicted in Fig. 17(a). As can be seen, the curves are in good agreement. There are some channels where predicted CRs are slightly larger than attained ones. This is explained by the imperfectness of VST and blind estimation of noise parameters for channels with high signal-to-noise ratio. The largest CRs take place for sub-bands with low SNR (these are the sub-bands with indices 13–20, 125–130, and 175–180).
The results for the AVIRIS test image Lunar Lake are given in Fig. 17(b). Here, the agreement between the predicted and the attained values is even better than for the Hyperion data. Again, the largest CR is observed for sub-bands with low SNR. There are considerable differences in maximal and minimal values of CR. The main reason is the different SNR and different dynamic range in sub-band images. Certainly, CR also depends upon the image content.
5. Conclusions and future work
It is demonstrated that it is possible to predict the efficiency of image filtering as well as the parameters of lossy compression of a noisy image in OOP neighborhood. As opposed to the earlier known approaches that allow predicting potential efficiency of filtering, the present approach predicts practically a reachable performance and makes this very rapidly, by one or more orders faster than filtering or compression itself.
Certainly, a limited number of quality metrics, filtering, and compression techniques have been considered. However, it is important that a general methodology of prediction is proposed, and it is shown there are somewhat strict connections between simple input parameters (that can be easily and quickly calculated) and output parameters that are able to adequately characterize the efficiency of filtering or lossy compression techniques. In favor of this methodology, there are certain facts. First, there are many modern filters that have filtering efficiency of the same order as the DCT-based filter and BM3D. Thus, predicting denoising efficiency for the filters mentioned above, it is possible to approximately predict performance for other modern filters (although such prediction would be less accurate). Second, the same holds for lossy compression methods. For example, AGU and JPEG2000 provide similar performance characteristics. Then, by predicting compression parameters for AGU, they are, in fact, estimated for JPEG2000 as well.
Concerning the decision making, whether to perform filtering or not, strict recommendations have been given for probabilities P2σ and P0.5σ. Filtering can be expedient if P2σ exceeds 0.5 or P0.5σ exceeds 0.15. Similarly, OOP is quite possible if P2σ is approximately 0.85 or larger. A very important fact is that these rules for filtering are valid for different types of noise (pure additive and signal-dependent, additive white Gaussian and spatially correlated). This generalization can be considered as one of the main contributions of this chapter. Meanwhile, the case of spatially correlated noise requires more attention in future. In prediction of filtering efficiency, general prediction approximations for spatially correlated noise with a priori known or pre-estimated properties (e.g., 2D spectrum) have not been obtained yet. It can only be expected that the scatterplots for spatially correlated noise with other (not analyzed yet) shapes and parameters of spatial power spectrum behave similarly. The studies for lossy compression of images corrupted by spatially correlated noise are yet to be started. This opens a very wide field for future research.
The results of this research show that although sometimes the prediction of performance characteristics based on one input parameter is appropriately accurate, there are several means to improve the prediction accuracy. One way that deals with multiparameter input has been already used for particular cases. The use of mean P0.5σ has shown itself a good solution, although it has not yet been tried for all possible applications. In particular, mean P0.5σ has not been tested for lossy compression. It is hoped that performance can be improved due to this reason. Neural networks or other approximators of multidimensional functions (surfaces) can be useful.
There are also other possible directions for future research. 3D filtering warrants a more thorough study, at least, for the case of more than three channels. The same relates to 3D lossy compression performance, which has not been tried to predict yet. Compression parameters for QS other than the one recommended for OOP is also of sufficient interest in DCT-based lossy compression. Influence of errors in a priori information on noise parameters or their blind estimates on prediction accuracy has to be studied as well.