Lossy Compression of Remote Sensing Images with Controllable Distortions

In this chapter, approaches to provide a desired quality of remote sensing images compressed in a lossy manner are considered. It is shown that, under certain conditions, this can be done automatically and quickly using prediction of coder performance parameters. The main parameters (metrics) are mean square error (MSE) or peak signal-to-noise ratio (PSNR) of introduced losses (distortions) although prediction of other important metrics is also possible. Having such a prediction, it becomes possible to set a quantization step of a coder in a proper manner to provide distortions of a desired level or less without compression/decompression iterations for single-channel image. It is shown that this approach can be also exploited in three-dimensional (3D) compression of multichannel images to produce a larger compression ratio (CR) for the same or less introduced distortions as for component-wise compression of multichannel data. The proposed methods are verified for test and real life images.


Introduction
A huge amount of data is provided nowadays by existing remote sensing (RS) sensors, both spaceborne and airborne [1,2]. Data volume is especially large if images are hyperspectral (i.e., having hundreds sub-band images) and/or high resolution ones. Note that both tendencies (to create and exploit multichannel systems as well as to produce high resolution data) are typical for recent years. Volume of acquired data additionally increases due to more frequent observations of sensed terrains [2]-it has become a usual practice to monitor a territory quite often, e.g., each week.
The obtained RS data have to be transferred, stored and/or disseminated. For each of this operation, data compression can be desirable [1,3,4]. Meanwhile, there are several obstacles that can prevent efficient execution of these operations. Concerning data transferring: bandwidth of a communication channel used to transfer data can be limited, time for transferring can be restricted, time and power for compression can be limited as well [1,3]. The same can relate to data dissemination although the limitations are usually less strict compared to downlink data transferring. Memory for RS data storage can be a problem too despite of rapid development of new facilities in recent years [2]. Therefore, it is often desired to compress RS images [4,5]. As known, there are lossless and lossy image compression techniques [1]. Limits attainable by lossless compression are practically reached [1]. Compression ratio (CR) for the existing methods rarely reaches 5 even for compressing hyperspectral data when inter-band correlation is exploited in full extent [4]. However, larger CR values are required often. Then, lossy compression of acquired RS data has to be applied.
The main peculiarity of lossy compression is that it introduces losses (distortions, degradations) into RS images. Then, it can be useful only under condition that introduced losses do not sufficiently negatively influence the goals the acquired RS data are intended for (terrain classification and/or parameter estimation, specific object detection, etc.). One assumption is that introduced losses have to be of the same level or smaller than degradations due to noise in original data [6]. Therefore, noise characteristics have to be taken into consideration and, thus, they should be known in advance or pre-estimated [7][8][9][10][11]. This also means that it is necessary to be able to control introduced distortions and/or to provide a desired level of losses. Moreover, often this should be done automatically, e.g., in on-board compression [3,12].
A slightly other assumption is possible if compressed images are subject to visual inspection and analysis. Then, introduced distortions should be such that they do not degrade image visual quality [13]. Then, one has to take into account both specific properties of component images, e.g., variations of their dynamic range [7,14,15] and peculiarities of human vision system (HVS).
Finally, one more assumption is that introduced distortions should be such that they do not have (noticeable) negative impact on classification accuracy or performance of other operations of RS data processing at final stages. Note that classification accuracy reduction is connected with metrics characterizing introduced distortions [16].
Thus, introduced distortions should be controlled for all aforementioned strategies. Here by "controlled" we mean several aspects. First, distortions have to be measured or estimated or predicted to ensure that they are not larger than allowed threshold according to a certain metric (criterion) [17,18]. Second, introduced distortions can be accurately measured only if compression and decompression are already done. Then, if distortion level has to be changed, coder parameters have to be changed and metric calculation has to be done after next iteration of compression/decompression [18]. This is often impractical, especially on-board. Then, it is more reasonable to talk about distortion estimation or prediction without compression and decompression but with approximate providing of a desired quality of compressed data.
Certainly, CR can be important as well. Then, an appropriate compromise has to be provided between CR and introduced losses. Note that CR also depends upon a used coder and a way data redundancy is exploited. In this sense, it is worth incorporating inter-channel correlation inherent for multichannel RS data that can be done in different ways [19][20][21]. It is possible to apply different transforms [11,[22][23][24] or to carry out different groupings of component images [11,25,26].
Lossy compression of images with taking into account noise type [27] and characteristics has been paid considerable attention [28][29][30]. Possible existence of optimal operation point (OOP) and its prediction have been claimed and studied [13,18]. Problems of CR prediction and its providing for coders based on discrete cosine transform (DCT) have been considered [18,31]. Meanwhile, problems of prediction of compressed image quality and providing a desired quality have not been thoroughly analyzed yet.
In this direction, a certain work has been done. In particular, an approach to quality prediction for wavelet based compression of remote sensing images has been put forward [32]. Prediction of mean square error (MSE) of introduced losses for JPEG has been done [33]. However, control and prediction of metric values for more advanced coders as AGU [34] and ADCT [35] that outperform JPEG considerably [36] were not developed till last 2 years. Since providing of a desired metric value using iterative (multiple) compression/decompression requires sufficient time and resources [36], it was decided to design a new approach without iterations [37]. Later this approach has been further advanced [38][39][40], mainly for singlecomponent (grayscale) images in 8-bit representation and with taking into account possible presence of noise.
In this chapter, we consider application of the designed approach to RS images including multichannel data and keeping in mind the following: (1) dynamic range of component images in multichannel data varies in wide limits and 16-bit representation is often used for them; (2) in many component images of multichannel (e.g., hyperspectral) data, input peak signal-to-noise ratio (PSNR) is high and noise influence is negligible; (3) there is essential correlation of signal component in neighbor sub-band images of multichannel images. We show that by taking into account these properties, it is possible to carry out efficient compression of multichannel RS data with controllable quality.

Peculiarities of RS image lossy compression
To understand the problem of lossy compression, some preliminaries are needed.
First, lossy compression introduces distortions due to which a decompressed image differs from the corresponding original one (subject to compression). These distortions are introduced at the stage of quantization of coefficients of a used orthogonal transform: wavelet, DCT or some other [34,35,41]. If DCT serves as the basis of lossy compression, quantization step (QS) or scaling factor (SF) serve as parameter that controls compression (PCC). A larger QS or SF leads, in general, to greater introduced distortions and a larger CR [34,35] but MSE of introduced losses and attained CR values considerably depend upon complexity of a compressed image and noise presence. Figure 1 presents three images: noise-free image Frisco of low complexity, the same image corrupted with additive white Gaussian noise with zero mean and variance 100, and noise-free image Airfield of quite high complexity (it contains a lot of edges and fine details).  Figure 2 shows dependences of mean square error MSE out between original and compressed images on QS for the case the advanced DCT (ADCT) coder [42] is applied. It is seen well that smaller distortions are introduced if an image is noisefree and has a simpler structure. The values of MSE out QS ð Þ for the same QS can differ by several times and, thus, i.e., QS itself does not determine MSE out QS ð Þ. Dependences CR QS ð Þ for the same images are presented in Figure 3. It is seen that the simple structure noise-free image Frisco is compressed in the best way whilst the complex structure image Airfield is compressed with the smallest CR. The reason is that the percentage of DCT coefficients that are assigned zero values after quantization increases if image complexity is lower, noise intensity is less, and QS is larger [31,43]. Thus, the rate/distortion curve is individual for each particular image and QS has to be adapted to image and noise properties to provide a desired compromise or to satisfy imposed requirements.
We have already mentioned that compression of noisy images has several peculiarities. Suppose that an acquired (noisy) image in a k-th component is image is represented as [8,10] I noisy kij where I noisy kij is the ijth sample of the kth component image, n kij is the ijth value of the in the kth component image supposed dependent on I true kij -the true value for the kijth voxel, I and J define the image size, K is the number of components. One can determine input MSE for each component image as  Dependences CR vs QS for noise-free and noisy images Frisco and noise-free image airfield.
and, respectively, input PSNR where D k is image dynamic range assumed individual for each component image Earlier analysis [7,44] has shown that MSE inp k , k ¼ 1, …, K and PSNR inp k , k ¼ 1, … , K in very wide limits for such typical examples of multichannel RS data as images provided by hyperspectral sensors AVIRIS [45] and Hyperion [46]. For more than 80% of component images, input PSNR exceeds 40 dB. This means that, most probably [42], OOPs for these component images do not exist, i.e.
Þ) has one minimum). If so, i.e. if quality of the compressed noisy image steadily decreases with QS growth, there should be some reasonable strategy to carry out compression for such an image or a group of images with similar properties. Here it is worth recalling the following. Analysis done in the paper [16] has shown that lossy compression has practically no negative impact on image classification accuracy if the metric PSNR-HVS-M [47] is not less than 42-44 dB.
The metric PSNR-HVS-M (PSNR À HVS À M c k ¼ 10 log 10 D 2 is MSE with taking into consideration specific features of human vision system (HVS)) takes into account two important peculiarities of human vision system: less sensitivity to degradations in high spatial frequencies and masking effect of textures. One can be surprised that visual quality metric has been used in analysis. This can be explained by the fact that the required values of PSNR-HVS-M > 42 dB mean that quality of a compressed image is such that introduced distortions are invisible. According to PSNR, this happens if PSNR c k exceeds 35-37 dB [48].
Thus, we need to provide a desired (controlled) quality of compressed images. This should be done quickly (desirably, without iterative compression/decompression), rather accurately, and with producing a large CR. We expect that CR increase can be gained due to grouping of component images.

An approach to providing controlled losses
Let us start from considering lossy compression of a single-channel noise-free image in 8-bit representation. After compression, one obtains fI c kij , i ¼ 1, … , I, j ¼ 1, … , J, k ¼ 1, … , Kg where quality of this image becomes worse for a larger CR or smaller bpp that takes place for larger QS or SF if a DCT-based coder is applied. Let us see how this happens for JPEG with uniform quantization of DCT coefficients. Suppose that an image to be compressed is divided into N=IJ/4 non-overlapping blocks of the size 8 Â 8 pixels. Then, in each block, we have DCT coefficients D n; k; l ð Þ; n ¼ 1; …; N; k ¼ 1; …; 7; l ¼ 1; …; 7 f g . After quantization, we have D q n; k; l ð Þ; n ¼ 1; …; N; k ¼ 1; …; 7; l ¼ 1; …; 7 È É . Then, MSE of losses can be determined as where D q n; k; l ð Þ¼ D n; k; l ð Þ=QS ½ , ΔD q n; k; l ð Þ¼QS Â D q n; k; l ð ÞÀD n; k; l ð Þ, k ¼ 0, …, 7, l ¼ 0, …, 7: and [] denotes rounding-off to the nearest integer, n denotes the block index. A usual assumption concerning distribution of quantization errors is that it is uniform or close to uniform. Then, MSE is about QS 2 =12: This is true for quite small QS (see data in Figure 2) but, for larger QS, MSE becomes smaller than QS 2 =12: The main reason is that distributions of alternating current (AC) DCT coefficients differ a lot depending upon an image. Figure 4 presents these distributions using the same scale for the three considered images ( Figure 1). Obviously, these distributions differ from Gaussian and from Laplacian (assumed in the paper [33]) as well. For the simple structure image, the distribution is quite narrow and it has heavy tails. If noise is present, the distribution "widens" and becomes closer to Gaussian.
It is seen from analysis of distribution in Figure 4a that if QS is about 10, most of AC DCT coefficients become zeros after quantization. Thus, we have decided to analyze quantization errors more in detail. Histograms of these errors for four cases are given in Figure 5. The histogram in Figure 5a shows that error distribution is close to uniform for the noise-free image Airfield that has wide distribution of AC DCT coefficients (Figure 4c). The distribution is also practically uniform for noisy image Frisco (noise standard deviation equals to 5, Figure 5d). Then, MSE of introduced losses is really close to QS 2 =12 (see data in Figure 2). In other cases (Figure 5b and c), the distributions sufficiently differ from uniform. This happens for noise-free image Frisco. Thus, introduced losses MSE is less than QS 2 =12: Hence, MSE ≈ QS 2 =12 can be treated as the upper limit of introduced losses. Note that this is valid not only for JPEG but for the coders AGU and ADCT [38][39][40]. This means that having a desired (threshold) MSE des , it is possible to easily calculate QS as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 12 MSE des p . A question is when the approximation MSE ≈ QS 2 =12 is valid? Note that if MSE is smaller than QS 2 =12, one can benefit from using a larger QS and providing a larger CR. Clearly, that if a desired PSNR des has to be provided, it has to be recalculated to MSE des taking into account dynamic range for a given image as Our idea [38][39][40] is that MSE can be predicted in one of two ways. The first way is determined as where R is the number of analyzed blocks (R ≪ N), C is a correcting factor used for a given coder. In other words, we employ statistics of DCT coefficients calculated in a limited number R of analyzed blocks of size 8x8 pixels. According to our studies [38,40], it is enough to have R about 500 where analyzed blocks are randomly distributed over area of an image to be compressed to have prediction accurate enough. Taking into account that number of 8 Â 8 pixel blocks in compressed images usually exceeds several thousands, prediction occurs to be much faster than even compression by JPEG. Certainly, prediction is much faster than compression by AGU (uses 32 Â 32 blocks, efficient coding and deblocking after decompression) and, especially, ADCT (exploits partition scheme optimization).
Expressions (5 and 6) allow predicting MSE for a given QS. But they do not allow direct setting of QS. One has to apply an iterative procedure that starts from QS ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 12 MSE des p . If the predicted MSE pred (5) occurs to be considerably (e.g., by 15-20% or more) smaller than MSE des , then a larger QS has to be tried with calculating (6) for all analyzed blocks and (5) again. Since the already calculated DCT coefficients are available, the procedure is quite fast. The second way is the following. Suppose that the predicted MSE can be presented as where f 0 X ð Þ is a function of one or two parameters X that can be easily and quickly calculated for DCT coefficients determined in analyzed blocks. Then one has to find such parameter(s) and the function. To solve this task, we have exploited our earlier experience in predicting filtering efficiency [49] and compression ratio [18] by simple analysis of DCT statistics in 8x8 pixel blocks and regression analysis [50,51].
The prediction strategy is the following. We suppose that there is an input parameter (or a few parameters) that can characterize a compressed image. It is also assumed that output (predicted) parameter (MSE, PSNR, CR, or another metric) is strictly connected with this (these) input parameter(s). This connection (prediction approximation) is available to the moment to carry out prediction, i.e., in our case, the function f 0 X ð Þ has been obtained in advance (in off-line mode). Then, one has to calculate input parameter(s) for a given QS and insert it (them) into f 0 X ð Þ. It has been shown in [52] that a good parameter integrally characterizing an image (its complexity) is probability P 0 that AC DCT coefficients after quantization become equal to zero (this parameter can be also treated as probability that AC DCT coefficient absolute values are smaller than QS/2). It is obvious that P 0 can be very easily calculated. Keeping these properties of P 0 in mind, we have obtained scatter plots of 12MSE=Q S 2 to estimate f 0 P 0 ð Þ. A wide set of test noise-free images has been used that included standard optical images, test RS images and test medical image (this was done to understand does the image nature (origin) influence performance of lossy compression; in fact, very similar results have been obtained for test images of different origin; the main factor is image complexity). Each point of the scatter plot corresponds to one test image compressed with some QS where vertical coordinate is P 0 determined for this case). Figure 6 presents scatter plots obtained for AGU and ADCT coders with examples of fitted curves. The main and very important observation is that the scatter plots behave in a compact manner, i.e. points that have approximately the same arguments have close values of 12MSE=QS 2 . Another observation is that the scatter plots for two considered coders behave in a very similar manner, i.e. there is a tendency to monotonous decreasing of 12MSE=QS 2 if P 0 increases. Finally, the scatter plots confirm that, in many practical situations, MSE ≈ QS 2 =12. At least, this is true for P 0 , 0:6.
It is worth recalling here that P 0 , 0:6 corresponds to rather small QS. To prove this, Figure 7 presents the scatter plot from [48] and the fitted curve. As it is seen, for P 0 , 0:6, CR does not exceed 5. If P 0 ≥ 0:6, there is the tendency of reduction of f 0 P 0 ð Þ. The scatter plot points are placed not so compactly here. Thus, prediction using only f 0 P 0 ð Þ becomes less accurate. Nevertheless, the following prediction procedure can be proposed:  As it is seen, all the operations are very easy and fast since they are performed for a limited number of AC DCT coefficients. Moreover, using the same parameter, it is possible to predict both MSE and CR. Then, it is easy to find a proper compromise depending upon priority of requirements and imposed restrictions.
One question is what curves to fit and what are criteria of fitting quality to be used. There are different approaches but we employed goodness-of-the-fit R 2 and RMSE [50] as two main criteria (the former one has to be maximized and the latter one minimized for a given scatter plot). Without going to details, we can state the following. For each scatter plot, usually there are several functions able to provide approximately the same R 2 and RMSE. Sums of two exponentials (see an example in Figure 7), polynomials of low order, Fourier series, power functions are good candidates to be tested. Using the corresponding tools of Matlab or Excel, it is possible to quickly find optimal or, at least, appropriately good solution.

Peculiarities of compression 4.1 Visual quality metrics
We have already mentioned that it is often desirable to predict visual quality metrics. To check whether or not this is possible, the scatter plot was got for MSE HVSÀM = Q S 2 =12 À Á vs. P 0 ( Figure 8). As it is seen, this ratio is about 0.05 for small P 0 (this happens for small QS and/or complex structure images), i.e. PSNR-HVS-M is by about 13 dB larger than PSNR. This means that introduced losses are masked by image content well and, most probably, they cannot be noticed visually. The difference in PSNR-HVS-M and PSNR decreases to 5-7 dB for P 0 . 0:5, i.e. typical conditions of lossy compression. The scatter plot and the fitted curve show that MSE HVSÀM can be predicted well for a given QS. In other words, visual metrics can be predicted too using the proposed approach. Again, the sum of two exponentials (just this case is presented in Figure 8) can serve well as approximation curve with quite small number of varied parameters.

Experimental data for component-wise compression
Let us present the results of applying the proposed approach to real-life hyperspectral data. Images of Hyperion sensor dataset EO1H1800252002116110KZ have been compressed. Hyperion sensor produces data of bad quality (very noisy) in sub-bands with indices k = 1,…,12 and k = 58,…,76. The images in these sub-bands are often discarded in analysis, so we have not compressed them.
Then, two approaches to compression have been compared. Both presume component-wise compression. The first one has been proposed earlier [11]. Images are compressed after applying variance stabilizing transform that takes into account signal-dependent noise properties and converts this noise to additive with variance approximately equal to unity. Then, the recommended QS = 3.5 (this notation is used in figures below). Inverse transform is applied component-wise after decompression. For the proposed method, the component-wise images have been transformed to the interval from 0 to 255. Then, for each of them, AGU coder has been applied with QS = 17 that approximately corresponds to PSNR des ¼ 34:5 dB (MSE des ≈ 24 ≈ 17 Â 17=12Þ. The notation QS = 17 is used for the corresponding data. The obtained PSNR values calculated between compressed and original component images are presented in Figure 9. As it is seen, PSNR for the method [11] in most sub-bands occurs to be considerably larger than PSNR des set by us. Only in some sub-bands (indices 165-185) where input PSNR is quite small the determined PSNR values are about 40 dB (i.e., the introduced losses are invisible in decompressed images). For the proposed approach, PSNR for the introduced losses is considerably smaller but, for all sub-band images, PSNR anyway exceeds 35 dB. As it follows from analysis of data in Figure 10, CR for all sub-bands exceeds 5 (a more detailed study shows that P 0 . 0:6 in all cases). Thus, MSE is smaller than Q S 2 =12 (see data in Figure 6) and the provided PSNR is larger than expected.
The main observation for data in Figure 10 is that CR for the proposed method is by several times larger than for the prototype method for almost all sub-bands except the bands with small input PSNR. Thus, we have gained essential benefit in CR sense while introduced distortions remained invisible. We do not present examples of original and compressed component images because visually they are identical. Note that setting a larger PSNR des leads to larger PSNR of introduced losses and smaller CR for each component image, respectively. By setting a larger PSNR des one can ensure that classification accuracy does not make worse.

3D compression
Consider now possibilities of 3D compression in groups. There are many different options [11]. We have analyzed one of the simplest ones where component images have been transformed to the 8-bit representation limits, then combined in 4-band groups, and then compressed by 3D version of AGU coder. After decompression the images have to be "stretched" to original limits.  As previously in Section 4.2, we have employed QS = 17. For convenience of comparison, the obtained data are also presented in Figures 9 and 10, for 3D compression they are denoted as QS = 17, bl = 4. CR values for the 3D case are shown the same for all components of the same group. As it is seen, CR values for 3D compression are about two times larger than for the proposed component-wise compression. This is an obvious advantage of 3D compression. Meanwhile, there are also very interesting observations stemming from analysis of data for PSNR ( Figure 9). As it is seen, there are many sub-bands for which PSNR for 3D compression is considerably larger (and the introduced losses are sufficiently smaller) than for component-wise compression. PSNR values are almost the same if subbands with small input PSNR are compressed. This is one more positive feature of 3D compression that should be studied more in detail in the future.

Conclusions
We have considered the task of lossy compression of RS images with controllable quality characterized by traditional metrics. It is shown that MSE and PSNR can be predicted for DCT-based coders and, due to this, it is possible to provide a desired MSE or PSNR without compression/decompression iterations quite quickly and accurately. Being applied to compress RS images without visible distortions, this approach allows providing CR considerably larger than for approach based on taking noise properties into account.
Moreover, it is demonstrated that prediction of some visual quality metrics is also possible. It is also shown that 3D compression of images collected into groups provides considerably better results. However, additional studies are needed to predict distortion parameters in this case. Examples for real-life data as hyperspectral image are presented. This research has been partly supported by the Project M/29-2018 of Ukrainian-French program "Dnipro" and STCU Project No. 6386.