Open access peer-reviewed chapter - ONLINE FIRST

On the Application of Dictionary Learning to Image Compression

By Ali Akbari and Maria Trocan

Submitted: May 16th 2019Reviewed: August 10th 2019Published: September 12th 2019

DOI: 10.5772/intechopen.89114

Downloaded: 42

Abstract

Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000.

Keywords

  • image compression
  • dictionary learning
  • sparse representation

1. Introduction

Signal models are fundamental tools for efficiently processing of the signals of interesting, including audio recordings, natural images, video clips, and medical scans, to name just a few. A signal model formulates a mathematical description of the family of signals of interesting in a way which faithfully captures their behaviour. Designing accurate signal models, which efficiently capture useful characteristics of the signals, has been a crucial aim in the signal processing area for so many years and a variety of mathematical forms has been proposed. Sparsity-based modelling has been used in many applications in which each signal is represented in terms of linear combinations of an underlying set, called dictionary, of elementary signals known as atoms, resulting in simple and compact models. The driving force behind this model is sparsity, i.e. the rapid decay of the representation coefficients over the dictionary. In this signal modelling, the dictionary plays an important role for the success of entire model in an efficient representation of the signal.

Finding appropriate dictionaries with good predictive power of various signal classes of interest and high compactness ability, especially natural images, has been an active field of research during past decades. The early attempts for designing dictionaries were based on building the model using harmonic analysis of the signal classes and extracting some mathematical functions, resulting in a fixed off-the-shelf dictionary called analytic or mathematical dictionary. The sparse representation using these fixed mathematical dictionaries is called analysis sparse modelling. The long series of works on designing the analytic dictionaries lead to appearing various transforms such as Fourier and its discrete version, discrete cosine, wavelets, curvelets, contourlets, bandlets, and steerable wavelets.

A significantly different approach to the sparse modelling, originally introduced by Olshausen and Field [1], consists of learning a dictionary from some training data. The sparse representation using this trained dictionary is called, synthesis sparse modelling. The trained dictionaries, also called synthetic dictionary, could efficiently capture the underlying structures in the natural image patches and are well adapted to a large class of natural signals.

The analysis and synthesis sparse signal modelling has led to the design effective algorithms for many image-processing applications, such as compression [2, 3, 4, 5, 6, 7, 8] and solving inverse problems [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. A straightforward application of the sparse signal modelling in the field of image processing has been image compression due to providing a compact representation of the signal. This chapter presents the theoretical and practical aspects of using the dictionary-based sparse signal modelling for image compression. We address one important question: How to efficiently represent an image over a trained dictionary in order to improve the performance of the image compression? Based on this concept, we present two novel image compression methods. Experimental results show that the introduced methods outperform the existing image coding standards, such as JPEG and JPEG2000.

The remainder of chapter is organized as follows: It starts with introducing the concept of sparsity in signal processing. The use of sparse representation modelling with respect to trained dictionaries is presented. Next, a brief description of a well-known and yet effective dictionary learning algorithm, introduced by Aharon et al. [23], is given. The end of chapter is devoted to introducing two generic image compression schemes. Finally, we conclude the chapter.

2. Sparsity-based signal models

One of the well-known methods in designing of the signal models is linear approximation. In this modelling technique, given a set of vectors dkRNk=0K1, a signal xRNis represented as a linear combination of Kbasis,

xk=0K1ckdk,E1

where set ckk=0K1consists of representation coefficients. The signal approximation Eq. (1) can be reformulated in a matrix form as

xDc,E2

where c=c1c2ckTRKis the coefficients vector. The matrix D=d1d2dKRN×Kis called dictionary and its columns constitute the dictionary atoms.

With the right choice of dictionary, the coefficients vector cis expected to be sparse, in the sense that its sorted coefficients decay rapidly. Motivated by this idea, the design of efficient complete dictionaries, in which, i.e., K=Nwas an active area of research during the last decades of twentieth century. The well-known Fourier transform [24], used in the JPEG compression standard [25], and wavelet transform [2], used in JPEG2000 compression standard [26], are the results of these works which sparsify uniformly the smooth signals. However, this signal modelling loses its optimality in representation of the image signals due to existence of curve singularities (elongated edges) in these types of signals [27]. As an instance, the images encoded by the JPEG2000 standard suffer from the ringing (smoothing) artefacts near edges.

In an attempt to minimize this weakness of the dictionaries, the design of more general over-complete dictionaries, which have more atoms than the dimension of signal, i.e. K>N, has been investigated over the past decades, and is still intensely ongoing. These dictionaries have a more descriptive ability to represent a wide range of interesting signals, in comparison with the invertible complete dictionaries.

2.1 Sparse modelling using over-complete dictionaries

Compared to the complete case, representation with over-complete dictionaries must be more carefully defined. There are two distinct paths for representing a signal using the over-complete dictionaries: analysis path and synthesis path. The analysis sparse modelling relies on the classical basics of the signal modelling in which the representation of the signal is identified as a linear combination of atoms,

xDca,E3

where the coefficients vector cais obtained via the inner products of the signal and the dictionary ca=Ωx=DTx. This method has the advantage of providing a simple and efficient way to achieve sparse representation over the dictionary. In this case, every signal has an unique representation as a linear combination of the dictionary atoms.

Increasing sparsity, in order to obtain a well-defined representation, requires departure from this linear representation towards a more flexible and non-linear representation. Each signal is represented using a different set of atoms from a pool, called dictionary, in order to achieve the best sparsity. Thus, the approximation process becomes

xDcs,E4

where the coefficients vector csis obtained via a non-linear approach, in contrast to the linear-based approach in the analysis path. This signal modelling approach, called synthesis sparse representation, needs further refinement to find the well-defined representation due to degrees of freedom identified by the null-space of D[27], which leads to a non-unique choice of csin Eq. (4), as opposed to the analysis sparse modelling which has a unique solution. In order to find the most informative representation, the coefficients vector csis obtained with respect to some cost function F, which minimizes the sparsity of the coefficients vector csunder a reconstruction constraint:

cs=arg mincRKFcsubject toxDc22ϵ,E5

where ϵis the prior knowledge about the noise level. The penalty function Fis defined in a way that is tolerant to the large coefficients and aggressively penalizes the small coefficients [27]. The normal choice for this function is pnorm with 0p1. Of specific interest is the 0case, i.e., Fc=c0, which counts the number of non-zeros in the representation. For this case, the problem Eq. (5) becomes

cs=arg mincRKc0subject toxDc22ϵ.E6

This problem, known to be NP-hard in general, can be efficiently approximated based on the idea of iterative greedy pursuit. The earliest and yet effective one includes the orthogonal matching pursuit (OMP) [28].

This formulation Eq. (6) has gained a large success beyond the statistics and signal processing communities and has been extensively employed in different signal processing algorithms. In the image-processing applications, since the size of natural images is too large, it is chosen to partition the image into blocks and the sparse modelling is done on the set of image blocks X=x1x2xL, each of size N×Npixels, where Nis an integer value and xiRNis lexicographically stacked representation of the i-th image patches.

2.2 Dictionary choice

In the discussion so far, it is assumed that the dictionaries of the analysis and synthesis models are known. Choosing the dictionary carefully is an important and involving task, in which substantial research has been invested. Based on analysis and synthesis models, the scientific community has developed two main routes for designing the dictionaries [27].

The first one is analytic dictionaries derived from a set of mathematical assumptions made on the family of the signals. The dictionaries of this type are generated by finding the appropriate mathematical functions through harmonic analysis of the interesting signals for which an efficient representation is obtained. For instance, Fourier basis is designed for optimal representation of smooth signals, while the wavelet dictionary is more suitable for piecewise-smooth signals with point singularities.

Designing analytic over-complete dictionaries are formulated as DDTx=xfor all x. Then, the approach tries to establish an appropriate dictionary by analysing the behaviour of DTxand establishing a decay rate. The curvelet [29], contourlet [30], and bandlet [31] transforms are some of the analytic dictionaries which provide comprehensive frameworks in order to handle the multi-dimensional signals.

Finding the more compact sparse representation has been a major driving force for the continued development of more efficient dictionaries. The synthesis formulation of the sparse representation paved the way to the design of an efficient dictionary, called synthetic dictionaries, from signal realizations via machine-learning techniques. The basic assumption behind this approach is that the structure of the complex natural signals can be more accurately extracted directly from the data than by using a general mathematical model [27]. In fact, this approach replaces prior assumptions on the signal behaviour with a training process which constructs the dictionary based on the observed signal properties. Compared to the analytic dictionaries, the synthetic dictionaries deliver an increased flexibility and the ability to adapt to specific signals and are superior in terms of representation efficiency at the cost of a non-structured and substantially more complex dictionary.

In this approach, a dictionary is trained for the sparse representation of small patches collected from a number of training signals. The desire to efficiently train a dictionary for the sparse representation led to developing some algorithms so far [1, 23, 32, 33, 34]. The earlier works on the dictionary learning mostly focused on statistical methods. Given the training image patches X=x1x2xL, where Lis the number of training patches, this method finds a dictionary which either maximizes the likelihood of the training data PXD[35] or the posterior probability of the dictionary PDX[36]. These formulations lead to some optimization problems that are solved in an expectation-maximization fashion, alternating estimation of the sparse representation and the dictionary using gradient descent or similar methods.

In neuroscience area, Olshausen and Field [1] proposed a significantly different approach for designing the dictionary using the training data, benefiting from modelling the receptive fields of simple cells in the mammalian primary visual cortex. In another attempt, the K-SVD algorithm introduced by Aharon et al. [23] is one of the well-known methods of dictionary learning. Given a set of examples X=x1x2xL, the goal of the K-SVD algorithm is to search the best possible dictionary DRN×Kfor the sparse representation of the training set Xthrough the optimization problem of Eq. (7):

arg minC,Di=1Lci0subject toXDC22ϵ,E7

where ϵis a fixed small value and C=c1c2cLis a matrix of size K×L, consisting of the representation coefficients vectors cii=1Lof the training samples. The expression in Eq. (7) is performed iteratively. First, by considering an initial dictionary, the algorithm tries to find the best coefficients matrix Cthat can be found. Once Dis known, the penalty, posed in Eq. (6), reduces to a set of Lsparse representation operations, like the ones seen in (Section 2.1). The OMP algorithm [28] is used for the near-optimal calculation of the coefficients matrix C.

At the next stage, the columns of dictionary are sequentially updated and relevant coefficients in the matrix Care simultaneously changed. At a time, one column is updated and the process of updating one column is based on the singular value decomposition (SVD) on the residual data matrices, computed only on the training samples that use this atom. The K-SVD algorithm includes a mechanism to control and rescale the 2-norm of the dictionary elements. Indeed, without such a mechanism, the norm of Dwould arbitrarily go to infinity. For more details, refer to [23].

In the image-processing applications, since the size of natural images is too large for learning a full matrix D, it is chosen to learn the dictionary on a set of natural image patches X=x1x2xL, each of size N×Npixels, where Nis an integer value and xiRNis lexicographically stacked representations of the i-th image patches.

2.3 Analysis versus synthesis

As mentioned before and also outlined in [27], some of the most important elements of effective dictionary design include localization, multi resolution, and adaptivity. Modern dictionaries typically provide localization in both the analysis and synthesis routes. However, multi-resolution property is usually better supported by the analytic structures, whereas adaptivity is mostly found in the synthetic methods.

The most important advantage of the analytic dictionaries is the easy and fast implementation. On the other hands, the main advantage of the trained dictionaries is their ability to provide a much higher degree of specificity to the particular signal properties, allowing them to produce better results in many practical applications such as image compression, feature extraction, content-based image retrieval and others. However, some drawbacks might arise after the compactness introduced by synthesis scheme. In this case, only if a few atoms for presenting a signal, the importance of all atoms largely varies. Consequently, any wrong choice of one atom could potentially lead to additional erroneous atoms that are selected as compensation, deviating further from the desired reconstruction. This weakness is usually appeared in the 0-norm-based non-convex optimization problem of Eq. (6). The convex relaxation approaches from 0to 1are more stable for the sparse representation at the expense of computational complexity. In the analysis formulation, however, all atoms take an equal part in describing the signal, thus minimizing the dependence on each individual one, and stabilizing the recovery process.

3. Image compression

Reducing the cost for storage or transmission of image signals with negligible degradation in the quality is the main goal of an lossy image compression algorithm. It tries to remove the redundancies among the image data by adopting different ways, such as transferring the image into a transform domain with compressible coefficients. In the transform domain, a few significant coefficients capture a large part of the image information. A typical lossy image compression algorithm usually encodes these significant transform coefficients to reduce the requirements for image storage or transmission. The analysis and synthesis sparse modellings are two powerful tools for transforming the image into a compressible domain. The JPEG [25] and JPEG2000 standards [26] are the results of using the analysis sparse representation of the image by designing analytic dictionaries, e.g. discrete cosine transform (DCT) and discrete wavelet transform (DWT), respectively. Since analytic dictionaries-based image sparse representation is typically over-simplistic, it fails to represent the high-textured images efficiently [27]. Due to this weakness of analysis sparse modelling in the efficient expressiveness, an extensive body of literature has recently focused on various applications of the synthesis sparse signal modelling via a trained dictionary. In this way, the performance can be significantly improved for the image compression application, benefiting the sparse representation of the image over a dictionary specifically adapted to it.

In order to improve the limitations of the traditional sparse representation approaches over a trained dictionary, several studies have been proposed in the literature [37, 38, 39]. In [37], the authors train a set of dictionaries. In order to improve the compression performance, the sparse representation is achieved by choosing an optimal dictionary among the trained dictionaries. In [38], a set of dictionaries in a tree structure is trained. At each level of tree, a dictionary is learned via generating the image residual. The residual is obtained by difference between the original and recovered images using the trained dictionary at the previous level of tree. Sparse representation is done by selection of one atom from each dictionary at each tree level. The total number of atoms is determined according to the sparsity level. Authors in [39] introduce the concept of multi-sample sparse representation (MSR) and incorporate it into the dictionary learning process. Each image patch is encoded with a certain sparsity level. To do this purpose, multiple neighbouring image patches are considered during the sparse representation to explore different sparsity levels. Based on this concept, an MSR-based image coding approach is proposed in [39].

Using a trained dictionary, each image is represented over the dictionary. The coefficients vector can be obtained via different algorithms such as the basis pursuit algorithms, matching pursuit techniques and other schemes [40]. These conventional approaches usually consider constant sparsity level, i.e. a fixed number of dictionary atoms is considered for representing all the image patches with different characteristics. This approach leads to a weak image compression performance. In this section, an adaptive sparse representation approach is presented. It is based on this fact that the visual significance, called visual saliency, of each image patch varies with its location within the image [41]. In other words, the human visual system (HVS) usually focuses on some parts of the image (salient regions), while other parts of the image have a lower level of visual interest. Therefore, designing an adaptive sparse representation scheme by considering the HVS characteristics plays an important role in designing an efficient image compression algorithm.

3.1 Dictionary learning-based image codec

Figure 1 presents the block diagram of the dictionary learning-based image coding (DLC) framework [7]. The DLC mainly has four main parts, pre-processing, dictionary learning, adaptive sparse representation and entropy coding. First, at the pre-processing step, the input image is divided into Lnon-overlapping image patches X=xii=1L. xiNrepresents a B×Bimage patch vectorized into a vector of size N. As other coding algorithms, the mean values of image patches (DC components) M=mii=1Land AC components Y=yii=1Lare separately encoded. The AC components are represented with respect to a trained dictionary Dof size N×Klearned by the K-SVD dictionary learning algorithm [23]. As other dictionary-based image compression methods, the trained dictionary should be shared between encoder and decoder. The OMP algorithm is used for sparse representation step in an adaptive manner using the visual saliency information. Incorporating this adaptive sparse representation step into the image coding framework aims to further reduce the reconstructed errors. Finally, the DC elements M=mii=1Land representation coefficients C=cii=1Lare entropy coded, where ciKdenotes the coefficients vector of the i-th image patch. In the following sections, the adaptive sparse representation and entropy coding steps are detailed. We ignore the details of decoder. However, it should be noted that the decoder can retrieve the image by a minor application of the above steps.

Figure 1.

Block diagram of the dictionary-based image coding (DLC) framework.

3.1.1 Adaptive synthesis sparse representation

The sparse representation step has a strong impact on effectively encoding of image patches and thus the rate-distortion performance. Graph-based visual saliency (GBVS) model, proposed in [42], has an capability to extract the saliency map of an image, i.e. locations within an image where have a high visual interest to a human observer. By incorporating this model into the sparse representation step, we build up an image coding scheme which compresses the image more efficiently than the traditional coders. Please refer to [42] for the details of the GVBS model. This saliency map of the image is exploited to determine the visual significance of the image patches in order to allocate different sparsity levels to each patch according to its visual significance to the HVS.

It should be noted that we normalize the saliency map values to the range 01. Therefore, the intensity of each pixel of this normalized saliency map stands for the probability belonging that pixel to a salient region. Some examples of saliency maps are shown in Figure 2. In the second row, brighter regions represent the salient locations which a human observer pays more attention to, while the darker areas represent the less-saliency regions.

Figure 2.

The original images and their saliency maps.

To obtain the saliency value of each image patch, first, the saliency map is partitioned into non-overlapping blocks of size B×Bpixels. Then, the average of the saliency values of pixels belonging to each block is considered as the saliency value of the corresponding patch. Let Hidenotes the saliency value of i-th image patch. Since saliency value of each image patch varies with its position within the image; therefore, by assigning different sparsity level to each block according to its saliency value leads to an improvement in the rate-distortion performance.

Consider the sparsity level of i-th image patch as:

Si=αiS.E8

Given the target sparsity level S, we aim to allocate a different sparsity level Sito the i-th image patch. It should be noted that the average of sparsity levels of all image patches must be equal (or slightly inferior) to the target sparsity level S. The factor αican be easily obtained via:

αi=L×Hii=1LHi.E9

where Hii=1Ldenotes the set of saliency values. As a result, a set of sparsity levels Sii=1Lis obtained via Eq. (8). Based on these new obtained sparsity levels, the sparse representation of each patch xiover the dictionary Dis achieved by:

argmincixiDci2 Subject To ci0Si.E10

The OMP method in [43] is used to solve this problem. By assigning different sparsity level Sito each block, a more effective sparse representation of the image is obtained.

3.1.2 Quantization step and entropy coding

Differential pulse-coded modulation (DPCM) scheme is used to quantize the DC elements. Let ei=mimi1denotes the residual between DC values of two neighbouring patches iand i1(here we set m0as 0). Instead of encoding M=mii=1Ldirectly, the residuals between subsequent DC elements, i.e. E=eii=1Lare quantized and encoded. The quantization is achieved via roundei/bwhere bis a constant value. A dead-zone quantizer is used for quantization of the residuals [43].

In order to further remove the redundancy, the Huffman coder is used to entropy-code the quantized DC values. During the entropy coding step, pre-defined code-word tables are used built offline. The same procedure is employed for quantizing and encoding the nonzero coefficients of C=cii=1L.

A big part of the bit-stream generated by the above scheme is occupied by encoding the indices of the representation coefficients, I. The reason behind this fact is that the non-zero coefficients in C=cii=1Lhave a random structure. To efficiently encode this random pattern, a quad-tree splitting algorithm is employed. First, each vector ciis divided into two equal parts. At the next step, a binary evaluation is achieved on each part: if each part consists of one nonzero coefficient, the encoder outputs 1; otherwise, the encoder sends 0. Each part including one nonzero coefficient at least, is then divided into two separate parts and the binary evaluation is again performed. This process continues until the maximum depth of partitioning is reached. In comparison with other methods like fixed length coding [39], this quad-tree splitting algorithm encodes the indices of nonzero coefficients more efficient.

3.2 Down-sampling-based codec

In a basic image compression approach, a down-sampling strategy is employed at the encoder side, as another lossy operation. Then, at the decoder side, an up-scaling algorithm is employed to reconstruct the high resolution (HR) image. In this technique, the high-frequency details within the image are removed by employing the down-sampling operator before the quantization lossy operator. Thus low frequency information within an image, where contain most of the energy image, is encoded by higher bit-rate. This bit allocation scheme leads to the better rate-distortion performance at the low bit-rates. Although the down-sampling operator improves the rate-distortion performance at low bit-rates, it eliminates the high frequency (HF) details. These information should be reproduced in order to recover a high quality image at the higher bit-rates.

In the DLC encoder, presented in previous section, the quantization and sparse representation are the main compression tools. In this section, the down-sampling operator is incorporated into the DLC codec as another lossy operator to further remove redundancies existing within an image. Instead of encoding the whole image, its low-resolution (LR) version is encoded. At the decoder side, an up-scaling algorithm is carefully designed to recover the high-frequency details. As discussed before, this strategy enhances the coding efficiency at the low bit-rates. However, designing an efficient up-scaling algorithm is an important step at the decoder side by which the high frequency information can be correctly recovered. A failure in designing a good up-scaling algorithm leads to a weak rate-distortion performance at the high bit-rates. The presented up-scaling scheme in this section recovers the HR image by encoding and sending the residual image, i.e. difference between the original image and the up-scaled one, as side information. The encoder generates the residual image and encodes it by the sparse representation of the residual image over a trained dictionary. Decoder recovers the final image by combination of the LR decoded image and the reconstructed residual. Both dictionaries used for the sparse representation of LR residual images are trained by a bi-level dictionary learning algorithm. Further, an image analyser is designed and incorporated into the encoder. The goal of this image analyser is for designing an adaptive sparse representation in order to assign higher bit-rate to the salient parts of the image. Experimental results illustrate that this down-sampling-based image compression scheme based on sparse representation (DCSR) achieves better rate-distortion performance when compared with the conventional codecs, such as JPEG and JPEG2000, and the DLC codec, described in Section 3.1.

The block diagram of the DCSR encoder is shown in Figure 3. This framework mainly consists of five parts, including down-sampling, up-scaling, sparse representation, quantization and entropy coding [8]. An image analyser is also introduced to the core of codec. At the pre-processing step, the down-sampling operator uses a blurring convolution kernel and a simple decimator by three to produce the LR image with only 1/9 of total pixels of the original image. At the up-scaling step, the down-sampled image is restored back to its original resolution. The goal of the up-scaling step is designing an accurate algorithm to improve the rate-distortion performance at the higher bit-rates. A joint sparse representation method is employed to solve this ill-posed, complex inverse problem. This method guarantees to restore the high frequency details which are enough and sufficient for reconstruction of the HR image.

Figure 3.

Block diagram of the DCSR codec.

Assume the LR image, XL, is obtained via XL=HXH, where His the down-sampling operator and XHdenotes the HR image. After the image recovery via the up-scaling algorithm, the relationship between the HR and the reconstructed images is described as XH=X˜H+E, where Eis the residual image. The LR image XLand the residual image Eare separately partitioned into non-overlapping image patches of size B×B. Let us assume xii=1TLand eii=1TEdenote the vectorized image blocks in the XLand E, respectively, where TLand TEare the number of patches in the XLand E, respectively. Moreover, note that TL=1/9TE.

At the sparse representation step, the LR image patches xii=1TLare sparsely represented over an over-complete dictionary DLof size N×Kusing the OMP algorithm. An image analyser is carefully designed to use the visual saliency information at the sparse representation step. The goal of the analyser is to further reduce the reconstructed errors in the areas within the image where have higher visual interest to the HVS. The same procedure is also applied for the sparse representation of the image patches of the residual image eii=1TE. A different over-complete dictionary DEof size N×Kis used for the sparse representation of the residual patches. The two dictionaries DLand DEare trained offline using a bi-level dictionary learning algorithm. It is assumed that these dictionaries are shared between the encoder and decoder. Finally, the obtained sparse coefficients of LR patches and residual patches, i.e. CL=ciLi=1TLand CE=ciEi=1TErespectively, are quantized and entropy coded in order to obtain the bit-stream. ciLRKand eiLRKdenote the coefficients vector of the i-block in the LR image XLand the residual image E, respectively.

At the decoder side, the image is reconstructed by a minor application of the above steps. First, the reconstructed LR image X̂Lis up-scaled by the up-scaling operator to restore the image X̂Hof size of the original image. Then, the final restored image is obtained via XD=X̂H+Ê, where Êdenotes the decoded residual image.

3.2.1 Image analyser for adaptive sparse representation

The main goal of image analyser is to enhance the performance of the DCSR encoder. It reduces the reconstructed error in some parts of the image which have higher visual interest to a human observer. Therefore, the image analyser extracts the salient regions within the image according to its visual significance to the HVS. Then, it allocates higher rates to the salient regions by assigning different sparsity levels to the image patches, located within the salient parts, in both LR and residual images. As mentioned before, the GBVS model [42] is used to obtain the salient regions within an image. Some examples of saliency maps are shown in Figure 2.

For finding the sparsity levels of the image patches in the LR image, the obtained saliency map is re-sized to the same size of the LR image. Then, it is divided into B×Bblocks to obtain the corresponding saliency value Hiof the i-th block by taking the average of the saliency values of pixels within that block. The goal is assign a different sparsity level Sito each block. Note that the average of the sparsity levels assigned to all blocks within the image should be equal (or slightly inferior) to the target sparsity level S. Given the target sparsity level Sand the set of saliency values Hii=1TL, the sparsity level of the i-th block is obtained by:

SiL=TL×Hii=1TLHiS,E11

where TLis the number of blocks in the LR image. Based on this strategy, different sparsity levels are allocated for the sparse representation of the image patches in the LR image. Given the obtained sparsity levels SiLi=1TL, the sparse representation of the LR patch xiin the LR image over the dictionary DLis achieved by [44]:

argminciLxiDLciL2, s.t. ciL0SiL.E12

The same procedure is applied to obtain the sparsity levels SiEi=1TEfor the image patches in the residual image. The set of saliency values Hii=1TEis obtained using the saliency map of the original image, as described before. The sparse representation of each patch eiin residual image with respect to the dictionary DEis achieved by:

argmineiEeiDEciE2, s.t. ciE0SiE.E13

The OMP method is used to solve these problems. At the low bit-rates, DCSR encodes only the LR image. At the higher bit-rates, DCSR encodes the residual image and send it as side information in order to improve the coding efficiency. In this case, the patches within the residual and LR images are represented over the corresponding dictionaries with the same target sparsity level S.

3.2.2 Joint dictionary-mapping learning for up-scaling

The goal of up-scaling algorithm is to recover the high frequency information eliminated during the down-sampling operation. One well-known up-scaling approach is to learn a function which maps the LR and HR image patches. This mapping function can be obtained using two training databases of the LR and HR image patches. Instead of finding the direct mapping function between the LR and HR patches, authors in [44] show that the mapping function can be efficiently learned in the transform domain. In this approach, first, the LR and HR patches are transferred into the representation space over the trained dictionaries and then the relationship between representation coefficients is learned.

Let two sets Xand Yconsist of Ntraining LR and corresponding HR image patches, respectively. Suppose DxRM×Kand DRMy×Kdenote the trained dictionaries for the sparse representation of the LR and HR patches. These two dictionaries are trained jointly such that the sparse representation of all training LR and HR patches are mapped to each other via a linear function MRK×K. The dictionaries and mapping matrix are obtained by solving the following minimization problem:

arg minDx,Dy,Λx,Λy,MXDxΛx22+YDyΛy22+γΛxy22+λΛx1+λΛy1+λmM22,E14

where ΛxRK×Nand ΛyRK×Nrepresent the corresponding sparse representation matrices. The terms γ, λand λmdenote regularization parameters. As proposed in [45], the minimization problem Eq. (14) is solved in an iterative approach with respect to a variable while all the other variables are considered as fixed parameters.

After training process, the up-scaling is achieved by the dictionary pair (i.e. Dxand Dy) and the mapping matrix M. First, the LR image is divided into overlapping blocks of size Bs×Bs. Then, the sparse representation vector αyof an LR patch yis obtained with respect to the trained dictionary Dy. In the minimization problem Eq. (14), we use 1-norm to train the dictionaries. Given these trained dictionaries, the up-scaled algorithm uses 0-norm, instead of 1norm, in order to improve the rate-distortion performance. Thus, the sparse representation of LR patch αyis obtained by solving the following minimization problem:

αy=arg minαyDyα22+δα0,E15

where δdenotes the regularization parameter. At the next step, we derive α̂x=Mαyand restore the HR patch via x̂=Dxα̂x. At the last step, the restored up-scaled image X̂His obtained by replacing the recovered image patches into the whole image grid and taking the average over the overlapped pixels. For more details about solving the above optimization problems, please refer to [44].

3.2.3 Bi-level dictionary learning algorithm for image compression

The up-scaling algorithm, described in previous section, leads to a bi-level dictionary learning algorithm which used for image compression at the DCSR encoder. Note that the trained dictionary DLand the up-scaling operator are used to estimate a large part of signal energy. The remained energy of residual is captured by representing the residual over another trained dictionary. This residual is encoded as side information to improve the quality of the restored image at the higher bit-rates. The bi-level dictionary learning algorithm is achieved by the following steps: first, the training images are down-sampled by a factor 3. Second, these LR images are employed to train a dictionary DLby the K-SVD algorithm [23]. Third, the LR training images are represented with respect to the trained dictionary DL. Forth, the reconstructed LR images are then up-scaled by the algorithm described in previous section. Fifth, these up-scaled images are used to create the training residual images which are subsequently used to train the second dictionary DRby the K-SVD algorithm.

3.2.4 Quantization and entropy coding

Differential pulse-coded modulation (DPCM) is used to quantize the non-zero elements in the sparse representation vectors CLand CE. A dead-zone quantizer is used for quantization. Further, the classical Huffman coding is employed to entropy-code the quantized sparse coefficients. Predefined code-word tables, needed for Huffman encoder, are constructed offline and stored at both encoder and decoder sides.

The location of non-zero coefficients in CLand CEfollow a random pattern. Therefore, encoding the indices of these coefficients occupies a big part output bit-stream. For efficiently encoding these indices I, the quad-tree splitting algorithm, explained in Section 3.1.2, is employed.

3.3 Experimental results

In this section, the rate-distortion performance of the DCSR codec is examined by performing a suite of experiments on a set of 8-bit grey-scale standard images. All evaluated images are re-sized to size of 528×528pixels in order to produce the LR image with only 1/9 of total pixels of the original image. The well-known peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index are chosen as a measure in the experiments. Further, the rate distortion performance is compared with the JPEG and JPEG2000 standards, as well as the DLC codec, presented in Section 3.1.

Block size is set as 8×8for encoding the LR and residual images. Further, two dictionaries DLand DEof size 64×440are trained using the bi-level dictionary learning algorithm presented in Section 3.2.3. We use the images from the CVG-Granada dataset1 for training. All these training images are partitioned into 8×8image patches and 12,000 ones are randomly selected for training. One hundred epochs, each processing 12,000 training vectors, are considered during training process. For up-scaling, the dictionary pair of size 25×256and the mapping matrix of size 256×256are trained using the algorithm introduced in Section 3.2.2. The regularization parameters γ, λ, and λmare empirically set as 0.1, 0.1, and 0.01, respectively.

Next, we present the rate-distortion graph for the test images in Figures 4 and 5. Note that we also provide the results for different baseline algorithms, i.e. JPEG,2 JPEG2000,3 and DLC algorithm [12]. As these figures shows DLC and DCSR codecs provide a better performance (in terms of PSNR and SSIM) than the available image coding standards JPEG and JPEG2000. Interestingly, as the Figure 4 demonstrates, the DCSR algorithm enhances the quality of the image over the JPEG2000 codec. The enhancement depends on the statistics of the image, so different enhancement is observed in different images. Please note that SSIM shows the perceived quality of the compressed image, therefore, SSIM curves for DCSR and JPEG2000 are very close to each other. For a better visualization, we removed the rate-distortion performance for JPEG2000 in Figure 5. As the trained dictionary performs a better capture and presents the contours more accurate, DCSR codec presents a higher performance compared to the analytic dictionaries such as the wavelet transform basis. Moreover, a 0.2 dBs improvement is observed using the DLC algorithm, marginally better for simpler images, like Mandrill.

Figure 4.

Rate-distortion performance of the DCSR codec compared to JPEG, JPEG2000 and DLC codecs in terms of PSNR for several test images (size 528 × 528 , grey-level).

Figure 5.

Rate-distortion performance of the DCSR codec compared to JPEG and DLC codecs in terms of MSSIM for several test images (size 528 × 528 , grey-level).

Several compressed images are shown in Figure 6 for visual comparisons. All images are coded with the same bit-rate (0.15) using JPEG and DCSR algorithm. As it can be seen, the JPEG standard fails to reconstruct images compressed at very low bit rates. In comparison, DCSR algorithm can preserve more image details. It is more effective in reconstruction of both the smooth area and the complex regions, including texture and edges, leading to visually much more pleasant recovery.

Figure 6.

Compressed images at bit-rate (0.15) using JPEG (left) and DCSR (right) scheme.

As a consequence, the down-sampling-based coding results in the high-frequency details in the image before performing the quantization lossy, hence larger number of bits are assigned low frequency information occupying more energy in the image. The presented bit assigning strategy brings a higher performance for images coded at low bit-rate. In higher bit-rates, the efficiency of coding is enhanced after the residual image is encoded as side information.

4. Conclusions

A brief summary of the signal modelling methodology has been given at the first part of chapter. We continued with the explicit and straightforward formulation of the sparse representation being more suitable for the compression tasks. We have just focused on the synthesis-based signal modelling because of being mature of the image compression using the analysis-based sparse signal modelling. An adaptive sparse representation over a trained over-complete dictionary was presented to compress the images. More specifically, given the saliency map of the image to be encoded, an image patch could be well represented with the linear combination of atoms selected from an over-complete and trained dictionary based on the sparsity level. Finally, at the end part of this chapter, a down-sampling operation is incorporated into the codes in order to improve the compression performance. The experimental results demonstrated that the presented image compression frameworks outperform image coding standards, such as JPEG and JPEG2000, which use an analytic dictionary.

Notes

  • 1http://decsai.ugr.es/cvg/dbimagenes
  • http://www.ijg.org
  • http://www.openjpeg.org

Download

chapter PDF

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ali Akbari and Maria Trocan (September 12th 2019). On the Application of Dictionary Learning to Image Compression [Online First], IntechOpen, DOI: 10.5772/intechopen.89114. Available from:

chapter statistics

42total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us