## Abstract

Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000.

### Keywords

- image compression
- dictionary learning
- sparse representation

## 1. Introduction

Signal models are fundamental tools for efficiently processing of the signals of interesting, including audio recordings, natural images, video clips, and medical scans, to name just a few. A signal model formulates a mathematical description of the family of signals of interesting in a way which faithfully captures their behaviour. Designing accurate signal models, which efficiently capture useful characteristics of the signals, has been a crucial aim in the signal processing area for so many years and a variety of mathematical forms has been proposed. Sparsity-based modelling has been used in many applications in which each signal is represented in terms of linear combinations of an underlying set, called *dictionary*, of elementary signals known as atoms, resulting in simple and compact models. The driving force behind this model is *sparsity*, i.e. the rapid decay of the representation coefficients over the dictionary. In this signal modelling, the dictionary plays an important role for the success of entire model in an efficient representation of the signal.

Finding appropriate dictionaries with good predictive power of various signal classes of interest and high compactness ability, especially natural images, has been an active field of research during past decades. The early attempts for designing dictionaries were based on building the model using harmonic analysis of the signal classes and extracting some mathematical functions, resulting in a fixed off-the-shelf dictionary called *analytic* or mathematical dictionary. The sparse representation using these fixed mathematical dictionaries is called *analysis* sparse modelling. The long series of works on designing the analytic dictionaries lead to appearing various transforms such as Fourier and its discrete version, discrete cosine, wavelets, curvelets, contourlets, bandlets, and steerable wavelets.

A significantly different approach to the sparse modelling, originally introduced by Olshausen and Field [1], consists of learning a dictionary from some training data. The sparse representation using this trained dictionary is called, *synthesis* sparse modelling. The trained dictionaries, also called *synthetic* dictionary, could efficiently capture the underlying structures in the natural image patches and are well adapted to a large class of natural signals.

The analysis and synthesis sparse signal modelling has led to the design effective algorithms for many image-processing applications, such as compression [2, 3, 4, 5, 6, 7, 8] and solving inverse problems [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. A straightforward application of the sparse signal modelling in the field of image processing has been image compression due to providing a compact representation of the signal. This chapter presents the theoretical and practical aspects of using the dictionary-based sparse signal modelling for image compression. We address one important question: How to efficiently represent an image over a trained dictionary in order to improve the performance of the image compression? Based on this concept, we present two novel image compression methods. Experimental results show that the introduced methods outperform the existing image coding standards, such as JPEG and JPEG2000.

The remainder of chapter is organized as follows: It starts with introducing the concept of sparsity in signal processing. The use of sparse representation modelling with respect to trained dictionaries is presented. Next, a brief description of a well-known and yet effective dictionary learning algorithm, introduced by Aharon et al. [23], is given. The end of chapter is devoted to introducing two generic image compression schemes. Finally, we conclude the chapter.

## 2. Sparsity-based signal models

One of the well-known methods in designing of the signal models is linear approximation. In this modelling technique, given a set of vectors

where set

where

With the right choice of dictionary, the coefficients vector

In an attempt to minimize this weakness of the dictionaries, the design of more general over-complete dictionaries, which have more atoms than the dimension of signal, i.e.

### 2.1 Sparse modelling using over-complete dictionaries

Compared to the complete case, representation with over-complete dictionaries must be more carefully defined. There are two distinct paths for representing a signal using the over-complete dictionaries: analysis path and synthesis path. The *analysis* sparse modelling relies on the classical basics of the signal modelling in which the representation of the signal is identified as a linear combination of atoms,

where the coefficients vector

Increasing sparsity, in order to obtain a well-defined representation, requires departure from this linear representation towards a more flexible and non-linear representation. Each signal is represented using a different set of atoms from a pool, called dictionary, in order to achieve the best sparsity. Thus, the approximation process becomes

where the coefficients vector *synthesis* sparse representation, needs further refinement to find the well-defined representation due to degrees of freedom identified by the null-space of

where

This problem, known to be NP-hard in general, can be efficiently approximated based on the idea of iterative greedy pursuit. The earliest and yet effective one includes the orthogonal matching pursuit (OMP) [28].

This formulation Eq. (6) has gained a large success beyond the statistics and signal processing communities and has been extensively employed in different signal processing algorithms. In the image-processing applications, since the size of natural images is too large, it is chosen to partition the image into blocks and the sparse modelling is done on the set of image blocks

### 2.2 Dictionary choice

In the discussion so far, it is assumed that the dictionaries of the analysis and synthesis models are known. Choosing the dictionary carefully is an important and involving task, in which substantial research has been invested. Based on analysis and synthesis models, the scientific community has developed two main routes for designing the dictionaries [27].

The first one is *analytic* dictionaries derived from a set of mathematical assumptions made on the family of the signals. The dictionaries of this type are generated by finding the appropriate mathematical functions through harmonic analysis of the interesting signals for which an efficient representation is obtained. For instance, Fourier basis is designed for optimal representation of smooth signals, while the wavelet dictionary is more suitable for piecewise-smooth signals with point singularities.

Designing analytic over-complete dictionaries are formulated as

Finding the more compact sparse representation has been a major driving force for the continued development of more efficient dictionaries. The synthesis formulation of the sparse representation paved the way to the design of an efficient dictionary, called *synthetic* dictionaries, from signal realizations via machine-learning techniques. The basic assumption behind this approach is that the structure of the complex natural signals can be more accurately extracted directly from the data than by using a general mathematical model [27]. In fact, this approach replaces prior assumptions on the signal behaviour with a training process which constructs the dictionary based on the observed signal properties. Compared to the analytic dictionaries, the synthetic dictionaries deliver an increased flexibility and the ability to adapt to specific signals and are superior in terms of representation efficiency at the cost of a non-structured and substantially more complex dictionary.

In this approach, a dictionary is trained for the sparse representation of small patches collected from a number of training signals. The desire to efficiently train a dictionary for the sparse representation led to developing some algorithms so far [1, 23, 32, 33, 34]. The earlier works on the dictionary learning mostly focused on statistical methods. Given the training image patches

In neuroscience area, Olshausen and Field [1] proposed a significantly different approach for designing the dictionary using the training data, benefiting from modelling the receptive fields of simple cells in the mammalian primary visual cortex. In another attempt, the K-SVD algorithm introduced by Aharon et al. [23] is one of the well-known methods of dictionary learning. Given a set of examples

where

At the next stage, the columns of dictionary are sequentially updated and relevant coefficients in the matrix

In the image-processing applications, since the size of natural images is too large for learning a full matrix

### 2.3 Analysis versus synthesis

As mentioned before and also outlined in [27], some of the most important elements of effective dictionary design include localization, multi resolution, and adaptivity. Modern dictionaries typically provide localization in both the analysis and synthesis routes. However, multi-resolution property is usually better supported by the analytic structures, whereas adaptivity is mostly found in the synthetic methods.

The most important advantage of the analytic dictionaries is the easy and fast implementation. On the other hands, the main advantage of the trained dictionaries is their ability to provide a much higher degree of specificity to the particular signal properties, allowing them to produce better results in many practical applications such as image compression, feature extraction, content-based image retrieval and others. However, some drawbacks might arise after the compactness introduced by synthesis scheme. In this case, only if a few atoms for presenting a signal, the importance of all atoms largely varies. Consequently, any wrong choice of one atom could potentially lead to additional erroneous atoms that are selected as compensation, deviating further from the desired reconstruction. This weakness is usually appeared in the

## 3. Image compression

Reducing the cost for storage or transmission of image signals with negligible degradation in the quality is the main goal of an lossy image compression algorithm. It tries to remove the redundancies among the image data by adopting different ways, such as transferring the image into a transform domain with compressible coefficients. In the transform domain, a few significant coefficients capture a large part of the image information. A typical lossy image compression algorithm usually encodes these significant transform coefficients to reduce the requirements for image storage or transmission. The analysis and synthesis sparse modellings are two powerful tools for transforming the image into a compressible domain. The JPEG [25] and JPEG2000 standards [26] are the results of using the analysis sparse representation of the image by designing analytic dictionaries, e.g. discrete cosine transform (DCT) and discrete wavelet transform (DWT), respectively. Since analytic dictionaries-based image sparse representation is typically over-simplistic, it fails to represent the high-textured images efficiently [27]. Due to this weakness of analysis sparse modelling in the efficient expressiveness, an extensive body of literature has recently focused on various applications of the synthesis sparse signal modelling via a trained dictionary. In this way, the performance can be significantly improved for the image compression application, benefiting the sparse representation of the image over a dictionary specifically adapted to it.

In order to improve the limitations of the traditional sparse representation approaches over a trained dictionary, several studies have been proposed in the literature [37, 38, 39]. In [37], the authors train a set of dictionaries. In order to improve the compression performance, the sparse representation is achieved by choosing an optimal dictionary among the trained dictionaries. In [38], a set of dictionaries in a tree structure is trained. At each level of tree, a dictionary is learned via generating the image residual. The residual is obtained by difference between the original and recovered images using the trained dictionary at the previous level of tree. Sparse representation is done by selection of one atom from each dictionary at each tree level. The total number of atoms is determined according to the sparsity level. Authors in [39] introduce the concept of multi-sample sparse representation (MSR) and incorporate it into the dictionary learning process. Each image patch is encoded with a certain sparsity level. To do this purpose, multiple neighbouring image patches are considered during the sparse representation to explore different sparsity levels. Based on this concept, an MSR-based image coding approach is proposed in [39].

Using a trained dictionary, each image is represented over the dictionary. The coefficients vector can be obtained via different algorithms such as the basis pursuit algorithms, matching pursuit techniques and other schemes [40]. These conventional approaches usually consider constant sparsity level, i.e. a fixed number of dictionary atoms is considered for representing all the image patches with different characteristics. This approach leads to a weak image compression performance. In this section, an adaptive sparse representation approach is presented. It is based on this fact that the visual significance, called visual saliency, of each image patch varies with its location within the image [41]. In other words, the human visual system (HVS) usually focuses on some parts of the image (salient regions), while other parts of the image have a lower level of visual interest. Therefore, designing an adaptive sparse representation scheme by considering the HVS characteristics plays an important role in designing an efficient image compression algorithm.

### 3.1 Dictionary learning-based image codec

Figure 1 presents the block diagram of the dictionary learning-based image coding (DLC) framework [7]. The DLC mainly has four main parts, pre-processing, dictionary learning, adaptive sparse representation and entropy coding. First, at the pre-processing step, the input image is divided into

#### 3.1.1 Adaptive synthesis sparse representation

The sparse representation step has a strong impact on effectively encoding of image patches and thus the rate-distortion performance. Graph-based visual saliency (GBVS) model, proposed in [42], has an capability to extract the saliency map of an image, i.e. locations within an image where have a high visual interest to a human observer. By incorporating this model into the sparse representation step, we build up an image coding scheme which compresses the image more efficiently than the traditional coders. Please refer to [42] for the details of the GVBS model. This saliency map of the image is exploited to determine the visual significance of the image patches in order to allocate different sparsity levels to each patch according to its visual significance to the HVS.

It should be noted that we normalize the saliency map values to the range

To obtain the saliency value of each image patch, first, the saliency map is partitioned into non-overlapping blocks of size

Consider the sparsity level of

Given the target sparsity level

where

The OMP method in [43] is used to solve this problem. By assigning different sparsity level

#### 3.1.2 Quantization step and entropy coding

Differential pulse-coded modulation (DPCM) scheme is used to quantize the DC elements. Let

In order to further remove the redundancy, the Huffman coder is used to entropy-code the quantized DC values. During the entropy coding step, pre-defined code-word tables are used built offline. The same procedure is employed for quantizing and encoding the nonzero coefficients of

A big part of the bit-stream generated by the above scheme is occupied by encoding the indices of the representation coefficients,

### 3.2 Down-sampling-based codec

In a basic image compression approach, a down-sampling strategy is employed at the encoder side, as another lossy operation. Then, at the decoder side, an up-scaling algorithm is employed to reconstruct the high resolution (HR) image. In this technique, the high-frequency details within the image are removed by employing the down-sampling operator before the quantization lossy operator. Thus low frequency information within an image, where contain most of the energy image, is encoded by higher bit-rate. This bit allocation scheme leads to the better rate-distortion performance at the low bit-rates. Although the down-sampling operator improves the rate-distortion performance at low bit-rates, it eliminates the high frequency (HF) details. These information should be reproduced in order to recover a high quality image at the higher bit-rates.

In the DLC encoder, presented in previous section, the quantization and sparse representation are the main compression tools. In this section, the down-sampling operator is incorporated into the DLC codec as another lossy operator to further remove redundancies existing within an image. Instead of encoding the whole image, its low-resolution (LR) version is encoded. At the decoder side, an up-scaling algorithm is carefully designed to recover the high-frequency details. As discussed before, this strategy enhances the coding efficiency at the low bit-rates. However, designing an efficient up-scaling algorithm is an important step at the decoder side by which the high frequency information can be correctly recovered. A failure in designing a good up-scaling algorithm leads to a weak rate-distortion performance at the high bit-rates. The presented up-scaling scheme in this section recovers the HR image by encoding and sending the residual image, i.e. difference between the original image and the up-scaled one, as side information. The encoder generates the residual image and encodes it by the sparse representation of the residual image over a trained dictionary. Decoder recovers the final image by combination of the LR decoded image and the reconstructed residual. Both dictionaries used for the sparse representation of LR residual images are trained by a bi-level dictionary learning algorithm. Further, an image analyser is designed and incorporated into the encoder. The goal of this image analyser is for designing an adaptive sparse representation in order to assign higher bit-rate to the salient parts of the image. Experimental results illustrate that this down-sampling-based image compression scheme based on sparse representation (DCSR) achieves better rate-distortion performance when compared with the conventional codecs, such as JPEG and JPEG2000, and the DLC codec, described in Section 3.1.

The block diagram of the DCSR encoder is shown in Figure 3. This framework mainly consists of five parts, including down-sampling, up-scaling, sparse representation, quantization and entropy coding [8]. An image analyser is also introduced to the core of codec. At the pre-processing step, the down-sampling operator uses a blurring convolution kernel and a simple decimator by three to produce the LR image with only 1/9 of total pixels of the original image. At the up-scaling step, the down-sampled image is restored back to its original resolution. The goal of the up-scaling step is designing an accurate algorithm to improve the rate-distortion performance at the higher bit-rates. A joint sparse representation method is employed to solve this ill-posed, complex inverse problem. This method guarantees to restore the high frequency details which are enough and sufficient for reconstruction of the HR image.

Assume the LR image,

At the sparse representation step, the LR image patches *i*-block in the LR image

At the decoder side, the image is reconstructed by a minor application of the above steps. First, the reconstructed LR image

#### 3.2.1 Image analyser for adaptive sparse representation

The main goal of image analyser is to enhance the performance of the DCSR encoder. It reduces the reconstructed error in some parts of the image which have higher visual interest to a human observer. Therefore, the image analyser extracts the salient regions within the image according to its visual significance to the HVS. Then, it allocates higher rates to the salient regions by assigning different sparsity levels to the image patches, located within the salient parts, in both LR and residual images. As mentioned before, the GBVS model [42] is used to obtain the salient regions within an image. Some examples of saliency maps are shown in Figure 2.

For finding the sparsity levels of the image patches in the LR image, the obtained saliency map is re-sized to the same size of the LR image. Then, it is divided into

where

The same procedure is applied to obtain the sparsity levels

The OMP method is used to solve these problems. At the low bit-rates, DCSR encodes only the LR image. At the higher bit-rates, DCSR encodes the residual image and send it as side information in order to improve the coding efficiency. In this case, the patches within the residual and LR images are represented over the corresponding dictionaries with the same target sparsity level

#### 3.2.2 Joint dictionary-mapping learning for up-scaling

The goal of up-scaling algorithm is to recover the high frequency information eliminated during the down-sampling operation. One well-known up-scaling approach is to learn a function which maps the LR and HR image patches. This mapping function can be obtained using two training databases of the LR and HR image patches. Instead of finding the direct mapping function between the LR and HR patches, authors in [44] show that the mapping function can be efficiently learned in the transform domain. In this approach, first, the LR and HR patches are transferred into the representation space over the trained dictionaries and then the relationship between representation coefficients is learned.

Let two sets

where

After training process, the up-scaling is achieved by the dictionary pair (i.e.

where

#### 3.2.3 Bi-level dictionary learning algorithm for image compression

The up-scaling algorithm, described in previous section, leads to a bi-level dictionary learning algorithm which used for image compression at the DCSR encoder. Note that the trained dictionary

#### 3.2.4 Quantization and entropy coding

Differential pulse-coded modulation (DPCM) is used to quantize the non-zero elements in the sparse representation vectors

The location of non-zero coefficients in

### 3.3 Experimental results

In this section, the rate-distortion performance of the DCSR codec is examined by performing a suite of experiments on a set of 8-bit grey-scale standard images. All evaluated images are re-sized to size of

Block size is set as ^{1} for training. All these training images are partitioned into

Next, we present the rate-distortion graph for the test images in Figures 4 and 5. Note that we also provide the results for different baseline algorithms, i.e. JPEG,^{2} JPEG2000,^{3} and DLC algorithm [12]. As these figures shows DLC and DCSR codecs provide a better performance (in terms of PSNR and SSIM) than the available image coding standards JPEG and JPEG2000. Interestingly, as the Figure 4 demonstrates, the DCSR algorithm enhances the quality of the image over the JPEG2000 codec. The enhancement depends on the statistics of the image, so different enhancement is observed in different images. Please note that SSIM shows the perceived quality of the compressed image, therefore, SSIM curves for DCSR and JPEG2000 are very close to each other. For a better visualization, we removed the rate-distortion performance for JPEG2000 in Figure 5. As the trained dictionary performs a better capture and presents the contours more accurate, DCSR codec presents a higher performance compared to the analytic dictionaries such as the wavelet transform basis. Moreover, a 0.2 dBs improvement is observed using the DLC algorithm, marginally better for simpler images, like *Mandrill*.

Several compressed images are shown in Figure 6 for visual comparisons. All images are coded with the same bit-rate (0.15) using JPEG and DCSR algorithm. As it can be seen, the JPEG standard fails to reconstruct images compressed at very low bit rates. In comparison, DCSR algorithm can preserve more image details. It is more effective in reconstruction of both the smooth area and the complex regions, including texture and edges, leading to visually much more pleasant recovery.

As a consequence, the down-sampling-based coding results in the high-frequency details in the image before performing the quantization lossy, hence larger number of bits are assigned low frequency information occupying more energy in the image. The presented bit assigning strategy brings a higher performance for images coded at low bit-rate. In higher bit-rates, the efficiency of coding is enhanced after the residual image is encoded as side information.

## 4. Conclusions

A brief summary of the signal modelling methodology has been given at the first part of chapter. We continued with the explicit and straightforward formulation of the sparse representation being more suitable for the compression tasks. We have just focused on the synthesis-based signal modelling because of being mature of the image compression using the analysis-based sparse signal modelling. An adaptive sparse representation over a trained over-complete dictionary was presented to compress the images. More specifically, given the saliency map of the image to be encoded, an image patch could be well represented with the linear combination of atoms selected from an over-complete and trained dictionary based on the sparsity level. Finally, at the end part of this chapter, a down-sampling operation is incorporated into the codes in order to improve the compression performance. The experimental results demonstrated that the presented image compression frameworks outperform image coding standards, such as JPEG and JPEG2000, which use an analytic dictionary.

## Notes

- 1http://decsai.ugr.es/cvg/dbimagenes
- http://www.ijg.org
- http://www.openjpeg.org