Open access peer-reviewed chapter

On the Application of Dictionary Learning to Image Compression

Written By

Ali Akbari and Maria Trocan

Submitted: 16 May 2019 Reviewed: 10 August 2019 Published: 12 September 2019

DOI: 10.5772/intechopen.89114

From the Edited Volume

Coding Theory

Edited by Sudhakar Radhakrishnan and Muhammad Sarfraz

Chapter metrics overview

943 Chapter Downloads

View Full Metrics

Abstract

Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000.

Keywords

  • image compression
  • dictionary learning
  • sparse representation

1. Introduction

Signal models are fundamental tools for efficiently processing of the signals of interesting, including audio recordings, natural images, video clips, and medical scans, to name just a few. A signal model formulates a mathematical description of the family of signals of interesting in a way which faithfully captures their behaviour. Designing accurate signal models, which efficiently capture useful characteristics of the signals, has been a crucial aim in the signal processing area for so many years and a variety of mathematical forms has been proposed. Sparsity-based modelling has been used in many applications in which each signal is represented in terms of linear combinations of an underlying set, called dictionary, of elementary signals known as atoms, resulting in simple and compact models. The driving force behind this model is sparsity, i.e. the rapid decay of the representation coefficients over the dictionary. In this signal modelling, the dictionary plays an important role for the success of entire model in an efficient representation of the signal.

Finding appropriate dictionaries with good predictive power of various signal classes of interest and high compactness ability, especially natural images, has been an active field of research during past decades. The early attempts for designing dictionaries were based on building the model using harmonic analysis of the signal classes and extracting some mathematical functions, resulting in a fixed off-the-shelf dictionary called analytic or mathematical dictionary. The sparse representation using these fixed mathematical dictionaries is called analysis sparse modelling. The long series of works on designing the analytic dictionaries lead to appearing various transforms such as Fourier and its discrete version, discrete cosine, wavelets, curvelets, contourlets, bandlets, and steerable wavelets.

A significantly different approach to the sparse modelling, originally introduced by Olshausen and Field [1], consists of learning a dictionary from some training data. The sparse representation using this trained dictionary is called, synthesis sparse modelling. The trained dictionaries, also called synthetic dictionary, could efficiently capture the underlying structures in the natural image patches and are well adapted to a large class of natural signals.

The analysis and synthesis sparse signal modelling has led to the design effective algorithms for many image-processing applications, such as compression [2, 3, 4, 5, 6, 7, 8] and solving inverse problems [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. A straightforward application of the sparse signal modelling in the field of image processing has been image compression due to providing a compact representation of the signal. This chapter presents the theoretical and practical aspects of using the dictionary-based sparse signal modelling for image compression. We address one important question: How to efficiently represent an image over a trained dictionary in order to improve the performance of the image compression? Based on this concept, we present two novel image compression methods. Experimental results show that the introduced methods outperform the existing image coding standards, such as JPEG and JPEG2000.

The remainder of chapter is organized as follows: It starts with introducing the concept of sparsity in signal processing. The use of sparse representation modelling with respect to trained dictionaries is presented. Next, a brief description of a well-known and yet effective dictionary learning algorithm, introduced by Aharon et al. [23], is given. The end of chapter is devoted to introducing two generic image compression schemes. Finally, we conclude the chapter.

Advertisement

2. Sparsity-based signal models

One of the well-known methods in designing of the signal models is linear approximation. In this modelling technique, given a set of vectors d k R N k = 0 K 1 , a signal x R N is represented as a linear combination of K basis,

x k = 0 K 1 c k d k , E1

where set c k k = 0 K 1 consists of representation coefficients. The signal approximation Eq. (1) can be reformulated in a matrix form as

x Dc , E2

where c = c 1 c 2 c k T R K is the coefficients vector. The matrix D = d 1 d 2 d K R N × K is called dictionary and its columns constitute the dictionary atoms.

With the right choice of dictionary, the coefficients vector c is expected to be sparse, in the sense that its sorted coefficients decay rapidly. Motivated by this idea, the design of efficient complete dictionaries, in which, i.e., K = N was an active area of research during the last decades of twentieth century. The well-known Fourier transform [24], used in the JPEG compression standard [25], and wavelet transform [2], used in JPEG2000 compression standard [26], are the results of these works which sparsify uniformly the smooth signals. However, this signal modelling loses its optimality in representation of the image signals due to existence of curve singularities (elongated edges) in these types of signals [27]. As an instance, the images encoded by the JPEG2000 standard suffer from the ringing (smoothing) artefacts near edges.

In an attempt to minimize this weakness of the dictionaries, the design of more general over-complete dictionaries, which have more atoms than the dimension of signal, i.e. K > N , has been investigated over the past decades, and is still intensely ongoing. These dictionaries have a more descriptive ability to represent a wide range of interesting signals, in comparison with the invertible complete dictionaries.

2.1 Sparse modelling using over-complete dictionaries

Compared to the complete case, representation with over-complete dictionaries must be more carefully defined. There are two distinct paths for representing a signal using the over-complete dictionaries: analysis path and synthesis path. The analysis sparse modelling relies on the classical basics of the signal modelling in which the representation of the signal is identified as a linear combination of atoms,

x Dc a , E3

where the coefficients vector c a is obtained via the inner products of the signal and the dictionary c a = Ωx = D T x . This method has the advantage of providing a simple and efficient way to achieve sparse representation over the dictionary. In this case, every signal has an unique representation as a linear combination of the dictionary atoms.

Increasing sparsity, in order to obtain a well-defined representation, requires departure from this linear representation towards a more flexible and non-linear representation. Each signal is represented using a different set of atoms from a pool, called dictionary, in order to achieve the best sparsity. Thus, the approximation process becomes

x Dc s , E4

where the coefficients vector c s is obtained via a non-linear approach, in contrast to the linear-based approach in the analysis path. This signal modelling approach, called synthesis sparse representation, needs further refinement to find the well-defined representation due to degrees of freedom identified by the null-space of D [27], which leads to a non-unique choice of c s in Eq. (4), as opposed to the analysis sparse modelling which has a unique solution. In order to find the most informative representation, the coefficients vector c s is obtained with respect to some cost function F , which minimizes the sparsity of the coefficients vector c s under a reconstruction constraint:

c s = arg min c R K F c subject to x Dc 2 2 ϵ , E5

where ϵ is the prior knowledge about the noise level. The penalty function F is defined in a way that is tolerant to the large coefficients and aggressively penalizes the small coefficients [27]. The normal choice for this function is p norm with 0 p 1 . Of specific interest is the 0 case, i.e., F c = c 0 , which counts the number of non-zeros in the representation. For this case, the problem Eq. (5) becomes

c s = arg min c R K c 0 subject to x Dc 2 2 ϵ . E6

This problem, known to be NP-hard in general, can be efficiently approximated based on the idea of iterative greedy pursuit. The earliest and yet effective one includes the orthogonal matching pursuit (OMP) [28].

This formulation Eq. (6) has gained a large success beyond the statistics and signal processing communities and has been extensively employed in different signal processing algorithms. In the image-processing applications, since the size of natural images is too large, it is chosen to partition the image into blocks and the sparse modelling is done on the set of image blocks X = x 1 x 2 x L , each of size N × N pixels, where N is an integer value and x i R N is lexicographically stacked representation of the i -th image patches.

2.2 Dictionary choice

In the discussion so far, it is assumed that the dictionaries of the analysis and synthesis models are known. Choosing the dictionary carefully is an important and involving task, in which substantial research has been invested. Based on analysis and synthesis models, the scientific community has developed two main routes for designing the dictionaries [27].

The first one is analytic dictionaries derived from a set of mathematical assumptions made on the family of the signals. The dictionaries of this type are generated by finding the appropriate mathematical functions through harmonic analysis of the interesting signals for which an efficient representation is obtained. For instance, Fourier basis is designed for optimal representation of smooth signals, while the wavelet dictionary is more suitable for piecewise-smooth signals with point singularities.

Designing analytic over-complete dictionaries are formulated as DD T x = x for all x . Then, the approach tries to establish an appropriate dictionary by analysing the behaviour of D T x and establishing a decay rate. The curvelet [29], contourlet [30], and bandlet [31] transforms are some of the analytic dictionaries which provide comprehensive frameworks in order to handle the multi-dimensional signals.

Finding the more compact sparse representation has been a major driving force for the continued development of more efficient dictionaries. The synthesis formulation of the sparse representation paved the way to the design of an efficient dictionary, called synthetic dictionaries, from signal realizations via machine-learning techniques. The basic assumption behind this approach is that the structure of the complex natural signals can be more accurately extracted directly from the data than by using a general mathematical model [27]. In fact, this approach replaces prior assumptions on the signal behaviour with a training process which constructs the dictionary based on the observed signal properties. Compared to the analytic dictionaries, the synthetic dictionaries deliver an increased flexibility and the ability to adapt to specific signals and are superior in terms of representation efficiency at the cost of a non-structured and substantially more complex dictionary.

In this approach, a dictionary is trained for the sparse representation of small patches collected from a number of training signals. The desire to efficiently train a dictionary for the sparse representation led to developing some algorithms so far [1, 23, 32, 33, 34]. The earlier works on the dictionary learning mostly focused on statistical methods. Given the training image patches X = x 1 x 2 x L , where L is the number of training patches, this method finds a dictionary which either maximizes the likelihood of the training data P X D [35] or the posterior probability of the dictionary P D X [36]. These formulations lead to some optimization problems that are solved in an expectation-maximization fashion, alternating estimation of the sparse representation and the dictionary using gradient descent or similar methods.

In neuroscience area, Olshausen and Field [1] proposed a significantly different approach for designing the dictionary using the training data, benefiting from modelling the receptive fields of simple cells in the mammalian primary visual cortex. In another attempt, the K-SVD algorithm introduced by Aharon et al. [23] is one of the well-known methods of dictionary learning. Given a set of examples X = x 1 x 2 x L , the goal of the K-SVD algorithm is to search the best possible dictionary D R N × K for the sparse representation of the training set X through the optimization problem of Eq. (7):

arg min C , D i = 1 L c i 0 subject to X DC 2 2 ϵ , E7

where ϵ is a fixed small value and C = c 1 c 2 c L is a matrix of size K × L , consisting of the representation coefficients vectors c i i = 1 L of the training samples. The expression in Eq. (7) is performed iteratively. First, by considering an initial dictionary, the algorithm tries to find the best coefficients matrix C that can be found. Once D is known, the penalty, posed in Eq. (6), reduces to a set of L sparse representation operations, like the ones seen in (Section 2.1). The OMP algorithm [28] is used for the near-optimal calculation of the coefficients matrix C .

At the next stage, the columns of dictionary are sequentially updated and relevant coefficients in the matrix C are simultaneously changed. At a time, one column is updated and the process of updating one column is based on the singular value decomposition (SVD) on the residual data matrices, computed only on the training samples that use this atom. The K-SVD algorithm includes a mechanism to control and rescale the 2 -norm of the dictionary elements. Indeed, without such a mechanism, the norm of D would arbitrarily go to infinity. For more details, refer to [23].

In the image-processing applications, since the size of natural images is too large for learning a full matrix D , it is chosen to learn the dictionary on a set of natural image patches X = x 1 x 2 x L , each of size N × N pixels, where N is an integer value and x i R N is lexicographically stacked representations of the i -th image patches.

2.3 Analysis versus synthesis

As mentioned before and also outlined in [27], some of the most important elements of effective dictionary design include localization, multi resolution, and adaptivity. Modern dictionaries typically provide localization in both the analysis and synthesis routes. However, multi-resolution property is usually better supported by the analytic structures, whereas adaptivity is mostly found in the synthetic methods.

The most important advantage of the analytic dictionaries is the easy and fast implementation. On the other hands, the main advantage of the trained dictionaries is their ability to provide a much higher degree of specificity to the particular signal properties, allowing them to produce better results in many practical applications such as image compression, feature extraction, content-based image retrieval and others. However, some drawbacks might arise after the compactness introduced by synthesis scheme. In this case, only if a few atoms for presenting a signal, the importance of all atoms largely varies. Consequently, any wrong choice of one atom could potentially lead to additional erroneous atoms that are selected as compensation, deviating further from the desired reconstruction. This weakness is usually appeared in the 0 -norm-based non-convex optimization problem of Eq. (6). The convex relaxation approaches from 0 to 1 are more stable for the sparse representation at the expense of computational complexity. In the analysis formulation, however, all atoms take an equal part in describing the signal, thus minimizing the dependence on each individual one, and stabilizing the recovery process.

Advertisement

3. Image compression

Reducing the cost for storage or transmission of image signals with negligible degradation in the quality is the main goal of an lossy image compression algorithm. It tries to remove the redundancies among the image data by adopting different ways, such as transferring the image into a transform domain with compressible coefficients. In the transform domain, a few significant coefficients capture a large part of the image information. A typical lossy image compression algorithm usually encodes these significant transform coefficients to reduce the requirements for image storage or transmission. The analysis and synthesis sparse modellings are two powerful tools for transforming the image into a compressible domain. The JPEG [25] and JPEG2000 standards [26] are the results of using the analysis sparse representation of the image by designing analytic dictionaries, e.g. discrete cosine transform (DCT) and discrete wavelet transform (DWT), respectively. Since analytic dictionaries-based image sparse representation is typically over-simplistic, it fails to represent the high-textured images efficiently [27]. Due to this weakness of analysis sparse modelling in the efficient expressiveness, an extensive body of literature has recently focused on various applications of the synthesis sparse signal modelling via a trained dictionary. In this way, the performance can be significantly improved for the image compression application, benefiting the sparse representation of the image over a dictionary specifically adapted to it.

In order to improve the limitations of the traditional sparse representation approaches over a trained dictionary, several studies have been proposed in the literature [37, 38, 39]. In [37], the authors train a set of dictionaries. In order to improve the compression performance, the sparse representation is achieved by choosing an optimal dictionary among the trained dictionaries. In [38], a set of dictionaries in a tree structure is trained. At each level of tree, a dictionary is learned via generating the image residual. The residual is obtained by difference between the original and recovered images using the trained dictionary at the previous level of tree. Sparse representation is done by selection of one atom from each dictionary at each tree level. The total number of atoms is determined according to the sparsity level. Authors in [39] introduce the concept of multi-sample sparse representation (MSR) and incorporate it into the dictionary learning process. Each image patch is encoded with a certain sparsity level. To do this purpose, multiple neighbouring image patches are considered during the sparse representation to explore different sparsity levels. Based on this concept, an MSR-based image coding approach is proposed in [39].

Using a trained dictionary, each image is represented over the dictionary. The coefficients vector can be obtained via different algorithms such as the basis pursuit algorithms, matching pursuit techniques and other schemes [40]. These conventional approaches usually consider constant sparsity level, i.e. a fixed number of dictionary atoms is considered for representing all the image patches with different characteristics. This approach leads to a weak image compression performance. In this section, an adaptive sparse representation approach is presented. It is based on this fact that the visual significance, called visual saliency, of each image patch varies with its location within the image [41]. In other words, the human visual system (HVS) usually focuses on some parts of the image (salient regions), while other parts of the image have a lower level of visual interest. Therefore, designing an adaptive sparse representation scheme by considering the HVS characteristics plays an important role in designing an efficient image compression algorithm.

3.1 Dictionary learning-based image codec

Figure 1 presents the block diagram of the dictionary learning-based image coding (DLC) framework [7]. The DLC mainly has four main parts, pre-processing, dictionary learning, adaptive sparse representation and entropy coding. First, at the pre-processing step, the input image is divided into L non-overlapping image patches X = x i i = 1 L . x i N represents a B × B image patch vectorized into a vector of size N . As other coding algorithms, the mean values of image patches (DC components) M = m i i = 1 L and AC components Y = y i i = 1 L are separately encoded. The AC components are represented with respect to a trained dictionary D of size N × K learned by the K-SVD dictionary learning algorithm [23]. As other dictionary-based image compression methods, the trained dictionary should be shared between encoder and decoder. The OMP algorithm is used for sparse representation step in an adaptive manner using the visual saliency information. Incorporating this adaptive sparse representation step into the image coding framework aims to further reduce the reconstructed errors. Finally, the DC elements M = m i i = 1 L and representation coefficients C = c i i = 1 L are entropy coded, where c i K denotes the coefficients vector of the i -th image patch. In the following sections, the adaptive sparse representation and entropy coding steps are detailed. We ignore the details of decoder. However, it should be noted that the decoder can retrieve the image by a minor application of the above steps.

Figure 1.

Block diagram of the dictionary-based image coding (DLC) framework.

3.1.1 Adaptive synthesis sparse representation

The sparse representation step has a strong impact on effectively encoding of image patches and thus the rate-distortion performance. Graph-based visual saliency (GBVS) model, proposed in [42], has an capability to extract the saliency map of an image, i.e. locations within an image where have a high visual interest to a human observer. By incorporating this model into the sparse representation step, we build up an image coding scheme which compresses the image more efficiently than the traditional coders. Please refer to [42] for the details of the GVBS model. This saliency map of the image is exploited to determine the visual significance of the image patches in order to allocate different sparsity levels to each patch according to its visual significance to the HVS.

It should be noted that we normalize the saliency map values to the range 0 1 . Therefore, the intensity of each pixel of this normalized saliency map stands for the probability belonging that pixel to a salient region. Some examples of saliency maps are shown in Figure 2 . In the second row, brighter regions represent the salient locations which a human observer pays more attention to, while the darker areas represent the less-saliency regions.

Figure 2.

The original images and their saliency maps.

To obtain the saliency value of each image patch, first, the saliency map is partitioned into non-overlapping blocks of size B × B pixels. Then, the average of the saliency values of pixels belonging to each block is considered as the saliency value of the corresponding patch. Let H i denotes the saliency value of i -th image patch. Since saliency value of each image patch varies with its position within the image; therefore, by assigning different sparsity level to each block according to its saliency value leads to an improvement in the rate-distortion performance.

Consider the sparsity level of i -th image patch as:

S i = α i S . E8

Given the target sparsity level S , we aim to allocate a different sparsity level S i to the i -th image patch. It should be noted that the average of sparsity levels of all image patches must be equal (or slightly inferior) to the target sparsity level S . The factor α i can be easily obtained via:

α i = L × H i i = 1 L H i . E9

where H i i = 1 L denotes the set of saliency values. As a result, a set of sparsity levels S i i = 1 L is obtained via Eq. (8). Based on these new obtained sparsity levels, the sparse representation of each patch x i over the dictionary D is achieved by:

argmin c i x i Dc i 2   Subject To   c i 0 S i . E10

The OMP method in [43] is used to solve this problem. By assigning different sparsity level S i to each block, a more effective sparse representation of the image is obtained.

3.1.2 Quantization step and entropy coding

Differential pulse-coded modulation (DPCM) scheme is used to quantize the DC elements. Let e i = m i m i 1 denotes the residual between DC values of two neighbouring patches i and i 1 (here we set m 0 as 0). Instead of encoding M = m i i = 1 L directly, the residuals between subsequent DC elements, i.e. E = e i i = 1 L are quantized and encoded. The quantization is achieved via round e i / b where b is a constant value. A dead-zone quantizer is used for quantization of the residuals [43].

In order to further remove the redundancy, the Huffman coder is used to entropy-code the quantized DC values. During the entropy coding step, pre-defined code-word tables are used built offline. The same procedure is employed for quantizing and encoding the nonzero coefficients of C = c i i = 1 L .

A big part of the bit-stream generated by the above scheme is occupied by encoding the indices of the representation coefficients, I . The reason behind this fact is that the non-zero coefficients in C = c i i = 1 L have a random structure. To efficiently encode this random pattern, a quad-tree splitting algorithm is employed. First, each vector c i is divided into two equal parts. At the next step, a binary evaluation is achieved on each part: if each part consists of one nonzero coefficient, the encoder outputs 1; otherwise, the encoder sends 0. Each part including one nonzero coefficient at least, is then divided into two separate parts and the binary evaluation is again performed. This process continues until the maximum depth of partitioning is reached. In comparison with other methods like fixed length coding [39], this quad-tree splitting algorithm encodes the indices of nonzero coefficients more efficient.

3.2 Down-sampling-based codec

In a basic image compression approach, a down-sampling strategy is employed at the encoder side, as another lossy operation. Then, at the decoder side, an up-scaling algorithm is employed to reconstruct the high resolution (HR) image. In this technique, the high-frequency details within the image are removed by employing the down-sampling operator before the quantization lossy operator. Thus low frequency information within an image, where contain most of the energy image, is encoded by higher bit-rate. This bit allocation scheme leads to the better rate-distortion performance at the low bit-rates. Although the down-sampling operator improves the rate-distortion performance at low bit-rates, it eliminates the high frequency (HF) details. These information should be reproduced in order to recover a high quality image at the higher bit-rates.

In the DLC encoder, presented in previous section, the quantization and sparse representation are the main compression tools. In this section, the down-sampling operator is incorporated into the DLC codec as another lossy operator to further remove redundancies existing within an image. Instead of encoding the whole image, its low-resolution (LR) version is encoded. At the decoder side, an up-scaling algorithm is carefully designed to recover the high-frequency details. As discussed before, this strategy enhances the coding efficiency at the low bit-rates. However, designing an efficient up-scaling algorithm is an important step at the decoder side by which the high frequency information can be correctly recovered. A failure in designing a good up-scaling algorithm leads to a weak rate-distortion performance at the high bit-rates. The presented up-scaling scheme in this section recovers the HR image by encoding and sending the residual image, i.e. difference between the original image and the up-scaled one, as side information. The encoder generates the residual image and encodes it by the sparse representation of the residual image over a trained dictionary. Decoder recovers the final image by combination of the LR decoded image and the reconstructed residual. Both dictionaries used for the sparse representation of LR residual images are trained by a bi-level dictionary learning algorithm. Further, an image analyser is designed and incorporated into the encoder. The goal of this image analyser is for designing an adaptive sparse representation in order to assign higher bit-rate to the salient parts of the image. Experimental results illustrate that this down-sampling-based image compression scheme based on sparse representation (DCSR) achieves better rate-distortion performance when compared with the conventional codecs, such as JPEG and JPEG2000, and the DLC codec, described in Section 3.1.

The block diagram of the DCSR encoder is shown in Figure 3 . This framework mainly consists of five parts, including down-sampling, up-scaling, sparse representation, quantization and entropy coding [8]. An image analyser is also introduced to the core of codec. At the pre-processing step, the down-sampling operator uses a blurring convolution kernel and a simple decimator by three to produce the LR image with only 1/9 of total pixels of the original image. At the up-scaling step, the down-sampled image is restored back to its original resolution. The goal of the up-scaling step is designing an accurate algorithm to improve the rate-distortion performance at the higher bit-rates. A joint sparse representation method is employed to solve this ill-posed, complex inverse problem. This method guarantees to restore the high frequency details which are enough and sufficient for reconstruction of the HR image.

Figure 3.

Block diagram of the DCSR codec.

Assume the LR image, X L , is obtained via X L = HX H , where H is the down-sampling operator and X H denotes the HR image. After the image recovery via the up-scaling algorithm, the relationship between the HR and the reconstructed images is described as X H = X ˜ H + E , where E is the residual image. The LR image X L and the residual image E are separately partitioned into non-overlapping image patches of size B × B . Let us assume x i i = 1 T L and e i i = 1 T E denote the vectorized image blocks in the X L and E , respectively, where T L and T E are the number of patches in the X L and E , respectively. Moreover, note that T L = 1 / 9 T E .

At the sparse representation step, the LR image patches x i i = 1 T L are sparsely represented over an over-complete dictionary D L of size N × K using the OMP algorithm. An image analyser is carefully designed to use the visual saliency information at the sparse representation step. The goal of the analyser is to further reduce the reconstructed errors in the areas within the image where have higher visual interest to the HVS. The same procedure is also applied for the sparse representation of the image patches of the residual image e i i = 1 T E . A different over-complete dictionary D E of size N × K is used for the sparse representation of the residual patches. The two dictionaries D L and D E are trained offline using a bi-level dictionary learning algorithm. It is assumed that these dictionaries are shared between the encoder and decoder. Finally, the obtained sparse coefficients of LR patches and residual patches, i.e. C L = c i L i = 1 T L and C E = c i E i = 1 T E respectively, are quantized and entropy coded in order to obtain the bit-stream. c i L R K and e i L R K denote the coefficients vector of the i-block in the LR image X L and the residual image E , respectively.

At the decoder side, the image is reconstructed by a minor application of the above steps. First, the reconstructed LR image X ̂ L is up-scaled by the up-scaling operator to restore the image X ̂ H of size of the original image. Then, the final restored image is obtained via X D = X ̂ H + E ̂ , where E ̂ denotes the decoded residual image.

3.2.1 Image analyser for adaptive sparse representation

The main goal of image analyser is to enhance the performance of the DCSR encoder. It reduces the reconstructed error in some parts of the image which have higher visual interest to a human observer. Therefore, the image analyser extracts the salient regions within the image according to its visual significance to the HVS. Then, it allocates higher rates to the salient regions by assigning different sparsity levels to the image patches, located within the salient parts, in both LR and residual images. As mentioned before, the GBVS model [42] is used to obtain the salient regions within an image. Some examples of saliency maps are shown in Figure 2 .

For finding the sparsity levels of the image patches in the LR image, the obtained saliency map is re-sized to the same size of the LR image. Then, it is divided into B × B blocks to obtain the corresponding saliency value H i of the i -th block by taking the average of the saliency values of pixels within that block. The goal is assign a different sparsity level S i to each block. Note that the average of the sparsity levels assigned to all blocks within the image should be equal (or slightly inferior) to the target sparsity level S . Given the target sparsity level S and the set of saliency values H i i = 1 T L , the sparsity level of the i -th block is obtained by:

S i L = T L × H i i = 1 T L H i S , E11

where T L is the number of blocks in the LR image. Based on this strategy, different sparsity levels are allocated for the sparse representation of the image patches in the LR image. Given the obtained sparsity levels S i L i = 1 T L , the sparse representation of the LR patch x i in the LR image over the dictionary D L is achieved by [44]:

argmin c i L x i D L c i L 2 ,   s . t .   c i L 0 S i L . E12

The same procedure is applied to obtain the sparsity levels S i E i = 1 T E for the image patches in the residual image. The set of saliency values H i i = 1 T E is obtained using the saliency map of the original image, as described before. The sparse representation of each patch e i in residual image with respect to the dictionary D E is achieved by:

argmin e i E e i D E c i E 2 ,   s . t .   c i E 0 S i E . E13

The OMP method is used to solve these problems. At the low bit-rates, DCSR encodes only the LR image. At the higher bit-rates, DCSR encodes the residual image and send it as side information in order to improve the coding efficiency. In this case, the patches within the residual and LR images are represented over the corresponding dictionaries with the same target sparsity level S .

3.2.2 Joint dictionary-mapping learning for up-scaling

The goal of up-scaling algorithm is to recover the high frequency information eliminated during the down-sampling operation. One well-known up-scaling approach is to learn a function which maps the LR and HR image patches. This mapping function can be obtained using two training databases of the LR and HR image patches. Instead of finding the direct mapping function between the LR and HR patches, authors in [44] show that the mapping function can be efficiently learned in the transform domain. In this approach, first, the LR and HR patches are transferred into the representation space over the trained dictionaries and then the relationship between representation coefficients is learned.

Let two sets X and Y consist of N training LR and corresponding HR image patches, respectively. Suppose D x R M × K and D R M y × K denote the trained dictionaries for the sparse representation of the LR and HR patches. These two dictionaries are trained jointly such that the sparse representation of all training LR and HR patches are mapped to each other via a linear function M R K × K . The dictionaries and mapping matrix are obtained by solving the following minimization problem:

arg min D x , D y , Λ x , Λ y , M X D x Λ x 2 2 + Y D y Λ y 2 2 + γ Λ x y 2 2 + λ Λ x 1 + λ Λ y 1 + λ m M 2 2 , E14

where Λ x R K × N and Λ y R K × N represent the corresponding sparse representation matrices. The terms γ , λ and λ m denote regularization parameters. As proposed in [45], the minimization problem Eq. (14) is solved in an iterative approach with respect to a variable while all the other variables are considered as fixed parameters.

After training process, the up-scaling is achieved by the dictionary pair (i.e. D x and D y ) and the mapping matrix M . First, the LR image is divided into overlapping blocks of size B s × B s . Then, the sparse representation vector α y of an LR patch y is obtained with respect to the trained dictionary D y . In the minimization problem Eq. (14), we use 1 -norm to train the dictionaries. Given these trained dictionaries, the up-scaled algorithm uses 0 -norm, instead of 1 norm, in order to improve the rate-distortion performance. Thus, the sparse representation of LR patch α y is obtained by solving the following minimization problem:

α y = arg min α y D y α 2 2 + δ α 0 , E15

where δ denotes the regularization parameter. At the next step, we derive α ̂ x = M α y and restore the HR patch via x ̂ = D x α ̂ x . At the last step, the restored up-scaled image X ̂ H is obtained by replacing the recovered image patches into the whole image grid and taking the average over the overlapped pixels. For more details about solving the above optimization problems, please refer to [44].

3.2.3 Bi-level dictionary learning algorithm for image compression

The up-scaling algorithm, described in previous section, leads to a bi-level dictionary learning algorithm which used for image compression at the DCSR encoder. Note that the trained dictionary D L and the up-scaling operator are used to estimate a large part of signal energy. The remained energy of residual is captured by representing the residual over another trained dictionary. This residual is encoded as side information to improve the quality of the restored image at the higher bit-rates. The bi-level dictionary learning algorithm is achieved by the following steps: first, the training images are down-sampled by a factor 3. Second, these LR images are employed to train a dictionary D L by the K-SVD algorithm [23]. Third, the LR training images are represented with respect to the trained dictionary D L . Forth, the reconstructed LR images are then up-scaled by the algorithm described in previous section. Fifth, these up-scaled images are used to create the training residual images which are subsequently used to train the second dictionary D R by the K-SVD algorithm.

3.2.4 Quantization and entropy coding

Differential pulse-coded modulation (DPCM) is used to quantize the non-zero elements in the sparse representation vectors C L and C E . A dead-zone quantizer is used for quantization. Further, the classical Huffman coding is employed to entropy-code the quantized sparse coefficients. Predefined code-word tables, needed for Huffman encoder, are constructed offline and stored at both encoder and decoder sides.

The location of non-zero coefficients in C L and C E follow a random pattern. Therefore, encoding the indices of these coefficients occupies a big part output bit-stream. For efficiently encoding these indices I , the quad-tree splitting algorithm, explained in Section 3.1.2, is employed.

3.3 Experimental results

In this section, the rate-distortion performance of the DCSR codec is examined by performing a suite of experiments on a set of 8-bit grey-scale standard images. All evaluated images are re-sized to size of 528 × 528 pixels in order to produce the LR image with only 1/9 of total pixels of the original image. The well-known peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index are chosen as a measure in the experiments. Further, the rate distortion performance is compared with the JPEG and JPEG2000 standards, as well as the DLC codec, presented in Section 3.1.

Block size is set as 8 × 8 for encoding the LR and residual images. Further, two dictionaries D L and D E of size 64 × 440 are trained using the bi-level dictionary learning algorithm presented in Section 3.2.3. We use the images from the CVG-Granada dataset1 for training. All these training images are partitioned into 8 × 8 image patches and 12,000 ones are randomly selected for training. One hundred epochs, each processing 12,000 training vectors, are considered during training process. For up-scaling, the dictionary pair of size 25 × 256 and the mapping matrix of size 256 × 256 are trained using the algorithm introduced in Section 3.2.2. The regularization parameters γ , λ , and λ m are empirically set as 0.1, 0.1, and 0.01, respectively.

Next, we present the rate-distortion graph for the test images in Figures 4 and 5 . Note that we also provide the results for different baseline algorithms, i.e. JPEG,2 JPEG2000,3 and DLC algorithm [12]. As these figures shows DLC and DCSR codecs provide a better performance (in terms of PSNR and SSIM) than the available image coding standards JPEG and JPEG2000. Interestingly, as the Figure 4 demonstrates, the DCSR algorithm enhances the quality of the image over the JPEG2000 codec. The enhancement depends on the statistics of the image, so different enhancement is observed in different images. Please note that SSIM shows the perceived quality of the compressed image, therefore, SSIM curves for DCSR and JPEG2000 are very close to each other. For a better visualization, we removed the rate-distortion performance for JPEG2000 in Figure 5 . As the trained dictionary performs a better capture and presents the contours more accurate, DCSR codec presents a higher performance compared to the analytic dictionaries such as the wavelet transform basis. Moreover, a 0.2 dBs improvement is observed using the DLC algorithm, marginally better for simpler images, like Mandrill.

Figure 4.

Rate-distortion performance of the DCSR codec compared to JPEG, JPEG2000 and DLC codecs in terms of PSNR for several test images (size 528 × 528 , grey-level).

Figure 5.

Rate-distortion performance of the DCSR codec compared to JPEG and DLC codecs in terms of MSSIM for several test images (size 528 × 528 , grey-level).

Several compressed images are shown in Figure 6 for visual comparisons. All images are coded with the same bit-rate (0.15) using JPEG and DCSR algorithm. As it can be seen, the JPEG standard fails to reconstruct images compressed at very low bit rates. In comparison, DCSR algorithm can preserve more image details. It is more effective in reconstruction of both the smooth area and the complex regions, including texture and edges, leading to visually much more pleasant recovery.

Figure 6.

Compressed images at bit-rate (0.15) using JPEG (left) and DCSR (right) scheme.

As a consequence, the down-sampling-based coding results in the high-frequency details in the image before performing the quantization lossy, hence larger number of bits are assigned low frequency information occupying more energy in the image. The presented bit assigning strategy brings a higher performance for images coded at low bit-rate. In higher bit-rates, the efficiency of coding is enhanced after the residual image is encoded as side information.

Advertisement

4. Conclusions

A brief summary of the signal modelling methodology has been given at the first part of chapter. We continued with the explicit and straightforward formulation of the sparse representation being more suitable for the compression tasks. We have just focused on the synthesis-based signal modelling because of being mature of the image compression using the analysis-based sparse signal modelling. An adaptive sparse representation over a trained over-complete dictionary was presented to compress the images. More specifically, given the saliency map of the image to be encoded, an image patch could be well represented with the linear combination of atoms selected from an over-complete and trained dictionary based on the sparsity level. Finally, at the end part of this chapter, a down-sampling operation is incorporated into the codes in order to improve the compression performance. The experimental results demonstrated that the presented image compression frameworks outperform image coding standards, such as JPEG and JPEG2000, which use an analytic dictionary.

References

  1. 1. Olshausen A, Field DJ. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research. 1997;16(4):3311-3325
  2. 2. Mallat S. A Wavelet Tour of Signal Processing. 3rd ed. Academic Press; 2008
  3. 3. Pennec EL, Mallat S. Bandelet image approximation and compression. SIAM Multiscale Modeling & Simulation. 2005;4(3):992-1039
  4. 4. Villegas OOV, Elias RP, Villela PR, Sanchez VGC, Salazar AM. Edging out the competition: Lossy image coding with wavelets and contourlets. IEEE Potentials. 2008;27(2):39-44
  5. 5. Wang S, Shu Z, Zhang L, Liu G, Gan L. Iterative image coding with overcomplete curvelet transform. In: Proceedings of IEEE Congress on Image and Signal Processing (CISP), vol. 1; May 2008; Hainan, China. pp. 666-670
  6. 6. Beferull-Lozano B, Ortega A. Coding techniques for oversampled steerable transforms. In: Proceedings of Asilomar Conference on Signals, Systems, and Computers, vol. 2; October 1999; Pacific Grove, CA. pp. 1198-1202
  7. 7. Akbari A, Mandache D, Trocan M, Granado B. Adaptive saliency-based compressive sensing image reconstruction. In: Proceedings of IEEE International Conference on Multimedia Expo Workshops (ICMEW); July 2016; Seattle, WA. pp. 1-6
  8. 8. Akbari A, Trocan M. Downsampling based image coding using dual dictionary learning and sparse representations. In: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP); August 2018. pp. 1-5
  9. 9. Akbari A, Trocan M. Sparse recovery-based error concealment for multiview images. In: Proceedings of IEEE International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM); October 2015; Prague, Czech Republic. pp. 1-5
  10. 10. Akbari A, Trocan M, Granado B. Synthesis sparse modeling: Application to image compression and image error concealment. In: Proceedings of Signal Processing with Adaptive Sparse Structured Representations Workshop (SPARS); June 2017; Lisbon, Portugal
  11. 11. Mandache D, Akbari A, Trocan M, Granado B. Image compressed sensing recovery using intra-block prediction. In: Proceedings of IEEE International Conference on Telecommunications Forum (TELFOR); November 2015; Belgrade, Serbia. pp. 748-751
  12. 12. Akbari A, Trocan M, Granado B. Image compression using adaptive sparse representations over trained dictionaries. In: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP); September 2016; Montreal, Canada. pp. 1-6
  13. 13. Akbari A, Trocan M, Granado B. Image error concealment using sparse representations over a trained dictionary. In: Proceedings of IEEE Picture Coding Symposium (PCS); December 2016; Nuremberg, Germany. pp. 1-5
  14. 14. Akbari A, Trocan M, Granado B. Residual based compressed sensing recovery using sparse representations over a trained dictionary. In: Proceedings of International ITG Conference on Systems, Communications and Coding (SCC); February 2017; Hamburg, Germany. pp. 1-6
  15. 15. Akbari A, Trocan M, Granado B. Sparse recovery-based error concealment. IEEE Transactions on Multimedia. 2017;19(6):1339-1350
  16. 16. Akbari A, Trocan M, Granado B. Joint-domain dictionary learning-based error concealment using common space mapping. In: Proceedings of IEEE International Conference on Digital Signal Processing (DSP); August 2017. pp. 1-5
  17. 17. Akbari A, Trocan M, Granado B. Image error concealment based on joint sparse representation and non-local similarity. In: Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP); November 2017
  18. 18. Akbari A, Trocan M, Sanei S, Granado B. Joint sparse learning with nonlocal and local image priors for image error concealment. IEEE Transactions on Circuits and Systems for Video Technology. 2019
  19. 19. Akbari A, Trocan M, Granado B. Image compressed sensed recovery using saliency-based adaptive sensing and residual reconstruction. In: Compressed Sensing: Methods, Theory and Applications. New York: Nova Science Publishers; 2018
  20. 20. Akbari A, Trevisi M, Trocan M, Carmona-Galán R. Compressive imaging using rip-compliant cmos imager architecture and landweber reconstruction. IEEE Transactions on Circuits and Systems for Video Technology. 2019:1-1
  21. 21. Akbari A, Trocan M. Robust image reconstruction for block-based compressed sensing using a binary measurement matrix. In: 2018 25th IEEE International Conference on Image Processing (ICIP); October 2018. pp. 1832-1836
  22. 22. Akbari A, Trevisi M, Trocan M. Adaptive compressed sensing image reconstruction using binary measurement matrices. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS); December 2018. pp. 659-660
  23. 23. Aharon M, Elad M, Bruckstein A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing. 2006;54(11):4311-4322
  24. 24. Ahmed N, Natarajan T, Rao KR. Discrete cosine transform. IEEE Transactions on Computers. 1974;C-23(1):90-93
  25. 25. G. K. Wallace, “The jpeg still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992
  26. 26. Skodras A, Christopoulos C, Ebrahimi T. The jpeg 2000 still image compression standard. IEEE Signal Processing Magazine. 2001;18(5):36-58
  27. 27. Rubinstein R, Bruckstein AM, Elad M. Dictionaries for sparse representation modeling. Proceedings of the IEEE. 2010;98(6):1045-1057
  28. 28. Pati YC, Rezaiifar R, Krishnaprasad PS. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proceedings of Asilomar Conference on Signals, Systems and Computers; November 1993; Pacific Grove, CA. pp. 40-44
  29. 29. Candés E, Demanet L, Donoho D, Ying L. Fast discrete curvelet transforms. SIAM Multiscale Modeling & Simulation. 2006;5(3):861-899
  30. 30. Do MN, Vetterli M. Contourlets: A new directional multiresolution image representation. In: Proceedings of Asilomar Conference on Signals, Systems and Computers, vol. 1; November 2002; Pacific Grove, CA. pp. 497-501
  31. 31. Pennec EL, Mallat S. Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing. 2005;14(4):423-438
  32. 32. Engan K, Skretting K, Husoy JH. Family of iterative LS-based dictionary learning algorithms, ILS-DLA, for sparse signal representation. Digital Signal Processing. 2007;17(1):32-49
  33. 33. Vidal R, Ma Y, Sastry S. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(12):1945-1959
  34. 34. Sulam J, Ophir B, Zibulevsky M, Elad M. Trainlets: Dictionary learning in high dimensions. IEEE Transactions on Signal Processing. 2016;64(12):3180-3193
  35. 35. Lewicki MS, Sejnowski TJ. Learning overcomplete representations. Neural Computation. 2000;12(2):337-365
  36. 36. Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ. Dictionary learning algorithms for sparse representation. Neural Computation. 2003;15(2):349-396
  37. 37. Gurumoorthy KS, Rajwade A, Banerjee A, Rangarajan A. A method for compact image representation using sparse matrix and tensor projections onto exemplar orthonormal bases. IEEE Transactions on Image Processing. 2010;19(2):322-334
  38. 38. Mazaheri JA, Guillemot C, Labit C. Learning a tree-structured dictionary for efficient image representation with adaptive sparse coding. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2013; Vancouver, Canada. pp. 1320-1324
  39. 39. Sun Y, Tao X, Li Y, Lu J. Dictionary learning for image coding based on multisample sparse representation. IEEE Transactions on Circuits and Systems for Video Technology. 2014;24(11):2004-2010
  40. 40. Peyre G. A review of adaptive image representations. IEEE Journal of Selected Topics in Signal Processing. 2011;5(5):896-911
  41. 41. Borji A, Cheng MM, Jiang H, Li J. Salient object detection: A benchmark. IEEE Transactions on Image Processing. 2015;24(12):5706-5722
  42. 42. Harel J, Koch C, Perona P. Graph-based visual saliency. In: Proceedings of Neural Information Processing Systems (NIPS); 2006. pp. 545-552
  43. 43. Skretting K, Engan K. Image compression using learned dictionaries by rls-dla and compared with K-SVD. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); May 2011; Prague, Czech Republic. pp. 1517-1520
  44. 44. Wang S, Zhang L, Liang Y, Pan Q. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 2012; Providence, RI. pp. 2216-2223
  45. 45. Huang DA, Wang YCF. Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: Proceedings of IEEE Conference on Computer Vision (ICCV); December 2013; Sydney, Australia. pp. 2496-2503

Notes

  • http://decsai.ugr.es/cvg/dbimagenes
  • http://www.ijg.org
  • http://www.openjpeg.org

Written By

Ali Akbari and Maria Trocan

Submitted: 16 May 2019 Reviewed: 10 August 2019 Published: 12 September 2019