## 1. Introduction

Remote sensing (RS) is an application area where compression of images acquired on-board of an aircraft or a spacecraft is a very important task [1]. Its actuality is explained by continuous tendencies of improving sensor spatial resolution, more frequent observation of sensed terrains, larger number of exploited channels (e.g., in multi- and, especially, hyperspectral sensing), etc. [2]. Meanwhile, the communication channel bandwidth and time of data transferring can be limited [1, 3, 4]. Facilities of data processing on-board can be restricted. Possibilities of image compression in a lossless manner are often limited as well [4]. Even the best existing methods of lossless compression applied to hyperspectral data and fully exploiting interband correlation inherent for such images provide a compression ratio (CR) of about 4.5 [4, 5], and this is often not enough. Thus, there is a need in efficient methods for lossy compression of acquired multichannel images.

There are several peculiarities of lossy compression with application to multichannel remote sensing images. First, if it is performed on-board, full or partial automation is required [1, 6]. Second, lossy compression is reasonable and useful only if introduced losses do not have essential impact on the value of compressed data, i.e., if accuracy and reliability of information extracted from compressed images are approximately at the same level as from original (uncompressed, compressed in a lossless manner) data. In this sense, introduced losses should be smaller, or in the worst case, comparable to the original image distortions due to noise [7]. This means that image-processing (compression) methods should be adaptive to noise characteristics. Meanwhile, noise in images acquired by modern multichannel RS sensors is not additive and has more complicated nature [8–11]. Thus, either blind estimation of its characteristics or attraction of available *a priori* information is needed. Third, adaptation to other specific properties of subband images is desired. Here, we mean that images in channels might have considerably different dynamic ranges, signal-to-noise ratios, and interchannel correlation factors [8, 12, 13].

All these influence efficiency of lossy compression and open perspectives of its improvement. Meanwhile, all or some of the aforementioned peculiarities of multichannel RS images are often ignored in the design of lossy compression techniques.

On the one hand, it is well understood that high interchannel correlation should be exploited for more sparse representation of data and reaching higher CR than for component-wise compression [14–16]. On the other hand, there are many different ways to realize this. Different transforms can be used [17–20]. Component image grouping can be organized in different manner [15, 21, 22] and till the moment there are no strict rules what is the best way to do this and what benefit can be maximally achieved compared to component-wise compression in the sense of CR under condition of the same or smaller distortions introduced.

Noise characteristics and different dynamic ranges of data in component images are often not taken into account in lossy compression as well. Little attention has been paid to these aspects in the design of lossy compression techniques for the considered application although it is clear that they are important and restrict applicability of methods designed for other types of multidimensional data [3, 20, 23].

Requirements to lossy compression of multichannel images and their priority have to be taken into consideration as well. The main requirements [1, 3, 20] are the following. First, introduced distortions should not negatively influence the efficiency of solving further tasks of multichannel image processing such as classification, object detection, visual inspection, etc. Only under aforementioned condition, the compressed data remain to be practically of the same value as original images. This means that introduced distortions should be less or of the same order as noise in each component (channel) image. Second, there can be a necessity to provide CR not smaller than some limit value or a desire to provide as large CR as possible. Third, lossy compression and operations associated with it (preliminary analysis of data, some transformations, and/or normalizations, etc.) have to be quite simple, especially if one deals with lossy compression on-board. Fourth, there can be some recommendations or restrictions imposed on standardization of lossy compression or mathematical basis. Currently, there are no standards for lossy compression of multichannel RS images although special efforts are put toward its creation [3]. In addition, it is understood that most of the aforementioned requirements can be met on the basis of 2D or 3D orthogonal transforms under condition of proper preparation of multichannel images to compression [20].

In this chapter, we focus on the aspects of automation and adaptation of lossy compression with application to multichannel image processing. First, we show that noise is signal dependent where its signal-dependent component is either of the same order as signal independent (additive) or is dominant [6, 8, 9]. Second, we show how this property can be taken into account at lossy compression stage by applying proper variance stabilizing transform (VST) in component-wise manner [20, 24]. Third, we analyze peculiarities of lossy compression in the neighborhood of the so-called optimal operation point (OOP) where introduced losses characterized by mean square error are of the same order as equivalent noise variance [25]. Fourth, we demonstrate that there is quite strict relation between OOP existence and compression ratio (CR) in it and some statistical parameters of noisy images [25, 26]. Moreover, there are quite easy methods to provide a desired CR by exploiting this statistics [27]. Fifth, we discuss and compare component-wise and 3D compression. Advantages of the latter approach have been paid special attention [28, 29] and more discussion on group size is provided.

## 2. Image and noise models and their parameters

While 10–20 years ago, it was usually assumed that noise is additive in all components of multichannel remote sensing data [30], studies carried out by different researchers [9, 10] indicate that the following image/noise model is more adequate

Here, *k*th component for a considered multichannel image, *ij*th value of the signal-dependent noise in the *k*th component image. To indicate that noise is signal dependent, we use notation *kij*th voxel, *I* and *J* define the data size, and *K* denotes the number of components. For multi- and hyperspectral images, model (1) transforms to

where

where

and the input PSNR

where

One can estimate equivalent noise variance for SD component as

where

where denotes the SI noise variance estimate (assumed accurate enough) for a *k*th component of multichannel image.

It is important to know how large is relative contribution of SD noise component into the input MSE. To get imagination about this, the values have been derived and graphically compared to [25]. The plots for AVIRIS [31] (224 subbands in optical visible and infrared ranges) and Hyperion [32] (242 subbands in the same ranges, for some very noisy subbands the estimates have not been obtained) sensors are represented in **Figure 1** in logarithmic scale (since there is a very wide limit of variation for these estimates). The values of for subbands for which negative estimates of

Analysis of data presented in **Figure 1** shows the following. For Hyperion data (**Figure 1a**), in most subbands of visible and near infrared ranges (these are subbands with indices from 13 to 61), Is larger than , i.e., the SD component contribution is prevailing. In infrared range (these are subbands with indices from 78 to 230; **Figure 1a**), there is approximately equal percentage of subbands where the influence of SD or SI components is dominant. According to our experiments, similar conclusions can be drawn for other real-life images acquired by the Hyperion sensor.

The results for three widely known test datasets acquired by the sensor AVIRIS are given in **Figure 1(b)**–**(d)**. Their analysis allows concluding the following. All three dependences of the same type (for instance, ) are very similar to each other. Then, if hyperspectral images are acquired during the same session, one can assume that noise characteristics do not change. In addition, are larger than for most images acquired by visible range AVIRIS sensor (spectrometer A, indices 1, …, 32). The same conclusion is valid for most subbands of the second AVIRIS spectrometer (B, indices 33, …, 96). Contributions of the considered noise components are comparable for the third spectrometer images (C, indices 97, …, 160). SI noise is dominating for most subband images acquired by the fourth AVIRIS spectrometer (D, indices 161, …, 224). Thus, contributions of noise components depend upon wavelength and sensor used in a hyperspectral system. But in any case, assumption on additive character of the noise is not valid. Moreover, for hyperspectral imaging, there is a tendency to increasing the relative contribution of SD component [33].

One more important property of multichannel RS images is that signal components in them are often cross-correlated. Meanwhile, the cross-correlation factor also depends upon noise intensity in both images and decreases if noise is intensive in one or both component images. Keeping this in mind, we have chosen for analysis the subband image with *k* = 166 that corresponds to far-infrared range and is acquired by the fourth spectrometer of AVIRIS. This image is quite noisy and the input PSNR for it is less than 30 dB (dynamic range of this image is small and this is the second reason of low input PSNR). The dependence of the cross-correlation factor *R* on *k* is presented in **Figure 2**.

The factor *R*(166) = 1 and for this subband image the signal-independent noise component is prevailing with the input MSE about 11. But we are more interested in *R*(*k*) for other subbands. Analysis of data presented in **Figure 2** shows that *R*(*k*) varies in rather wide limits. On average, values of *R*(*k*) are the largest for the subband images acquired by the fourth spectrometer of AVIRIS imager for which *k* > 160. Meanwhile, cross-correlation factors are large enough for subbands relating to other ranges as well.

Although cross-correlation of images in multichannel images is often high, there can also be sufficient variation in the dynamic range *k*th subband image, respectively. For hyperspectral data, the values *D*_{k} and *D*_{k+1}, i.e., for neighbor subbands, are usually close enough. As it follows from analysis of noise components in **Figure 1**, neighbor channels commonly have quite close values of input MSEs (equal to noise variance

## 3. Considered performance criteria and peculiarities of lossy compression of noisy images

After lossy compression of a multichannel image, one obtains {*i = 1, …,I, j = 1,…,J, k = 1,…,K*}. If one deals with lossy compression of a noise-free image, then quality of compressed image is worse for a larger compression ratio (smaller bpp, larger quantization step or scaling factor for DCT-based coders). The reason is that more distortions are introduced for larger CR.

Meanwhile, many researchers [34–36] have stressed that there are peculiarities in lossy compression of noisy images. Lossy compression leads to a specific noise removal effect that can be large enough under certain conditions. Due to this, it might be possible that MSE for compressed image

is less than **Figure 3** presents dependences of

on QS for the lossy DCT-based coder AGU [37] applied to three standard grayscale test noisy RS images, Airfield, Aerial, and Frisco. All three images were corrupted by additive white Gaussian noise (AWGN) with variance 100. Note that the test image Frisco has a simpler structure while the test images Aerial and Airfield have more details. This is the reason why the denoising effect of lossy compression is considerably greater for the image Frisco and the dependence for it has an obvious global maximum. This is OOP according to the metric **Figure 8**. For the test image Aerial, the OOP is not so “obvious” although it exists. Finally, for the test image Airfield, there is no OOP formally but the dependence

For the recommended

where positive

Let us consider dependences of CR on QS for the same test noisy images as in **Figure 3**. These dependences are represented in **Figure 4**. The first observation is that lossy compression with the recommended QS leads to sufficiently different compression ratios for different images. Recall that the recommended QS is equal to 40 for *σ*^{2}=100 and to 56 for *σ*^{2}=200. Simpler structure and/or noisier images are compressed with larger CR. For QS=40 (*σ*^{2}=100), one has CR about 17 for the image Frisco and about 7 for two other test images. If noise intensity is greater (QS=56, *σ*^{2}=200), larger CR values are attained: about 26 for Frisco and 14 for other two images. Thus, noisier images are compressed in OOP with larger CR. This means that image complexity and noise intensity should be taken into account in practice. Some ways to do this will be described in the next section.

## 4. Efficiency for 3D compression

### 4.1. Main dependences and benefits

As mentioned above, the compression of multichannel RS images can be carried out component-wise and using variants of 3D approach. In the former case, there are certain benefits. First, it is easier to handle data. For example, QS or bpp can be set individually for each component image. Second, a part of operations can be performed in parallel. For example, orthogonal transforms and quantization of coefficients can be performed separately for each component image and, thus, this part of processing can be parallelized. In the latter case, 3D compression can be applied to multichannel image as a whole [14] or as to a set of component image groups [15, 22]. Each variant has its own positive features and drawbacks. If groups are used, it is easier to parallelize computations (since processing can be partly performed separately in each group) and adjust compression parameters.

Let us analyze some peculiarities of 3D compression for a rather simple three-channel test image (presented in **Figure 5a**). This image has been considered noise free and it has been composed from three channels of visible range of Landsat RS data associated with red, green, and blue components for visualization. The noisy image with artificially added AWGN having the same variance equal to 130 in all components is shown in **Figure 5(b)**. Noise is seen well in quasihomogeneous regions.

The plots of **Figure 6(a)**. Notation 2d relates to two-dimensional, i.e., component-wise compression. For all three components, the plots almost coincide and, therefore, we present the averaged dependence. In turn, notation 3d concerns 3D compression using 3D version of AGU coder [15]. Again, the dependence averaged for all three components is given.

There are several interesting observations for these plots. If QS is rather small, e.g., less than 2*σ*, the dependences *σ*, i.e., when OOP can be observed. First, OOP is observed (see **Figure 6a**) for both 2D and 3D compression. But **Figure 7** that presents images compressed in OOP for 2D and 3D AGU coder versions. Second, OOP in the case of 3D compression is observed for the same conditions as for 2D compression. More examples confirming this can be found in the paper [29].

Consider now the plots of CR(QS) represented in **Figure 6(b)**. For QS less than 2*σ* there are almost no benefits of 3D compression. However, for larger QS, the benefits become obvious. CR provided by 3D compression occurs to be almost twice larger than for component-wise processing. A question is why this happens? Another question is can we predict CR and situations when 3D compression might be beneficial compared to component-wise coding.

### 4.2. Prediction of compression parameters

There are two main compression parameters for which prediction is desirable for compression in OOP neighborhood, namely,

Having described the general strategy of prediction, let us give some details. First of all, there are many parameters that can be used as inputs [45–47]. Under condition that noise parameters (variance) are known in advance or preestimated with appropriate accuracy, statistical parameters of the family *N*_{bl} blocks of size 8 × 8 pixels are less than a threshold *N*_{bl} blocks of size 8 × 8 pixels are equal to zero after quantization with a used QS.

Obtaining dependence between output and input parameters is a special stage performed in advance (offline). This stage presumes getting a scatterplot where the horizontal axis corresponds to an input parameter and vertical relates to a predicted output parameter. Scatter-plot points correspond to a test image corrupted by AWGN with a certain variance compressed in a specified way. An example of the scatterplot is shown in **Figure 8**.

Having such a scatterplot, curve fitting is applied to obtain a desired dependence. At this substage, several subtasks should be solved. They can be, in general, treated as providing good fit and include choice of proper type and parameters of approximating functions, accounting for restrictions, etc. Different criteria of fitting quality can be used [48] where *R*^{2} (goodness of fit that has to approach unity for good fit) is one of the commonly employed parameters. The example for 2D image compression presented in **Figure 8** shows that the scatter-plot points are not spread a lot and it can be assumed that the dependence is a smooth function. Then, polynomials of the fourth and fifth orders and some other functions provide appropriate results (the fitted polynomial expression is presented in **Figure 8**). The performance of prediction for different input parameters should be analyzed and compared since considerably different values of *R*^{2} can be potentially and practically produced [27]. Some analysis has been already carried out [27] but this study is far from completeness.

Similar strategy has been applied in the prediction of CR for 2D compression. The first attempt to predict CR for lossy compression of noisy images in OOP for AGU and ADCT [38] coders has been made in 2015 [26] using two input parameters, **Figure 9**). As it can be seen, both scatterplots have small spread and, according to their visual inspection, CR has the tendency to increase if the input parameters

Thus, we can expect that benefits of 3D compression compared to 2D deal with more zeros after 3D DCT (better decorrelation of the data) than in component-wise compression. To check this hypothesis, we have determined **Figure 5(b)**. The results are given as dependences of **Figure 9b**) and this practically coincides with the value of practically attained CR (**Figure 6b**). In turn, for 3D compression, **Figure 9b**), and this is in good agreement with the value of practically attained CR (**Figure 6b**). Certainly, a more thorough study is needed. However, we can expect that the prediction of CR using **Figure 10**).

### 4.3. Experimental data

The observations described above have also been verified for two types of multichannel images. The first type is Landsat TM data [50]. Different variants of uniting eight images of the same resolution into groups for further 3D compression have been considered. It has been shown that there are benefits in CR (it sufficiently increases for the same level of introduced distortions) only if images combined into a group are highly correlated and have similar dynamic range [50]. Then, there is an increase in the percentage of zeros

The second type of analyzed data is hyperspectral images acquired by the Hyperion sensor (the dataset EO1H1800252002116110KZ). Hyperion produces bad-quality (very noisy) data in some bands (for example, in subbands with indices *q*=1–12). These component images are usually discarded in analysis and we have not processed them too.

Hyperspectral data can be compressed with and without utilizing VST to take into account signal-dependent nature of the noise. Below we consider data obtained for the procedure that employs VST for both 2D and 3D compression. In both cases, after determining the parameters of the noise in all subbands (if needed), the generalized Anscombe transform and/or normalization is carried out [20]. Note that original data are presented as 16-bit values and this is taken into account in CR calculation and prediction.

We have considered four variants to compress the data. The first variant is to perform component-wise compression. The second is to divide this hyperspectral image into two groups. The first group includes subbands with indices from 13 to 57 while the second one contains subband images with indices from 83 to 224. The third variant is to use groups of size eight subbands. The fourth is to apply 16-channel groups. Some subbands left in both ranges formed groups of smaller size. CR for all subbands of each group is assumed to be the same since all images are compressed jointly.

The obtained results are presented in **Figure 11**. Their analysis shows several interesting facts. First, CR for component-wise compression is, on average, sufficiently smaller than for any of 3D compression variants. If component-wise compression is applied, CR for neighbor subbands are close to each other although the total range of CR variation is rather wide—from about 4 to about 27. In general, there is correlation between CR values for 2D and 3D compression. If CR for 2D compression is larger, CR for variants of 3D compression is usually larger too. However, there are a few exceptions when CR for a particular subband image compressed separately is larger than for 3D compression. This happens for subband images with low-input SNR and low correlation with data in neighbor subbands [50].

It is difficult to understand from visual inspection of plots (see **Figure 11**) what variant of 3D compression is preferable. More thorough analysis has shown that the CRs for groups of size 8 (18.34 and 12.72) and 16 (20.81 and 14.65) subbands are quite close. CRs for the case of using only two large unequal size groups are slightly smaller (17.43 and 13.00).

We have also determined the percentage of zeros for 3D compression in groups of size 8 and 16 subbands. The results are presented in **Figure 12**. As can be seen, there is tight correlation between CR for a group and the corresponding percentage. This allows expecting that it is possible to predict CR for 3D compression in groups by analyzing

Examples of real-life images before and after compression for particular subbands can be found in the paper [22]. If noise intensity is high and noise is visible, lossy compression provides noticeable filtering effect. If noise is invisible, original and compressed images look almost the same.

## 5. Conclusions

The task of lossy compression of multichannel remote sensing images is considered. It is shown that this type of data has some peculiarities to be taken into account in compression. The main peculiarities consist in signal-dependent nature of the noise, wide limits of variation of data dynamic range and SNR in subband images, and sufficient correlation of data in neighbor channels. Lossy compression should be carried out in automatic manner especially if it has to be performed on-board. Then, it has to adapt to noise properties where the simplest adaptation mechanism is to set QS proportional to noise standard deviation (before or after VST depending upon whether it is applied or not). A good decision in compression of noisy images is to perform compression in the neighborhood of optimal operation point. It is shown that OOP exists for both component-wise and 3D compression where the latter approach is preferable since it produces better denoising and considerably larger CR. Parameters of compression can be predicted rather easily before execution compression with quite high accuracy. This allows adapting compression to image and noise properties and to undertake decision does compression performance meet requirements.

Meanwhile, there are several tasks to be solved in the future. The main of them could be adaptive grouping. Another task is QS adjusting to provide a desired CR.