Open access peer-reviewed chapter - ONLINE FIRST

Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM Optimization

Written By

Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang

Reviewed: January 14th, 2022 Published: February 27th, 2022

DOI: 10.5772/intechopen.102681

Recent Advances of Wavelet Transform and Their Applications Edited by Francisco Bulnes

From the Edited Volume

Recent Advances of Wavelet Transform and Their Applications [Working Title]

Dr. Francisco Bulnes

Chapter metrics overview

32 Chapter Downloads

View Full Metrics


We represent the image noise reduction and restoration problems as context-dependent graphs and propose algorithms to derive the optimal graphs by the alternating direction method of multipliers (ADMM) method. An image is spatially decomposed into smooth regions and singular regions, consisting of edges and textures. The graph representing a smooth region is defined in the image domain, while that representing a singular region is defined in the wavelet domain. The optimal graphs are formulated as the solutions of constrained optimization problems over sparse graphs, where the sparseness is imposed on the edges. The graphs on the wavelet domain are solved in a hierarchical layer structure. The convergence and complexity of the algorithms have been studied. Simulation experiments demonstrate that the results of our algorithms are superior to the state-of-the-art algorithms for image noise reduction and restoration.


  • image restoration
  • image denoising
  • graph
  • ADMM
  • wavelet

1. Introduction

We consider the inverse problem of deriving the original image xRNfrom an observation yRN, expressed as


where His an N×Nmatrix and nN0N×1σIN×Nis a vector of independent and identically distributed (i.i.d.) Gaussian random variables with standard deviation σ. We further assume that the point spread function His known. By imposing prior information on the desired image, given as


where the first term is the data fidelity term for the Gaussian observation model and the second term is the regularization term, measuring the penalty of a solution that deviated away from the prior knowledge of the desired image. The modeling of the desired image is at the core of the approach [1, 2, 3, 4]. The primary challenge of solving the problem is to recover the local high-frequency information of edges and texture in the original images that are not present in the observation.

If His the identity matrix, problem (1) is called the noise reduction problem. The solutions vary with what type of noise is contaminated in the observation [5]. If the noise is white Gaussian noise, the state-of-the-art algorithms for the problem are BM3D [6], WBN [7], and DRUNet [8]. BM3D utilizes the tight frame representation of an image, where atoms of the frame are derived from image patches. WBN is a graphical probabilistic model of a weighted directed acyclic graph (DAG) in the wavelet domain. Different from BM3D and WBN, DRUNet is a deep learning method. It is a flexible and powerful deep CNN denoiser and the architecture is the combination of U-Net [9] and ResNet [10]. It not only outperforms the state-of-the-art deep Gaussian denoising models but also is suitable to solve plug-and-play image restoration.

If His a blur singular matrix, problem (1) is called the image restoration problem. In the optimization-based method, the best image restoration performance both subjectively and objectively was derived from the algorithm IDD-BM3D [11]. It utilizes sparse synthetic and analytic models and de-couples the problem into blur inverse and noise reduction sub-problems, each of which is solved by a variational optimization approach. In deep learning, DPIR [8] replaces the denoising sub-problem of model-based optimization with a learning-based CNN denoiser prior which is DRUNet. By iteratively solving the data sub-problem and a prior sub-problem to restore the image.

In this chapter, we also present a restoration algorithm that combines the noise reduction algorithm with the proximal point method [12]. The primary technical contributions of our methods are the context-dependent graphical representations and the algorithms to derive the optimal graphs of each representation. Finding the optimal graph in a combinatorial way is extremely difficult and likely an NP-hard problem [13, 14]. Unlike the combinatorial approach, we impose constraints on edges and include edges in the optimal graph only when the constraints on the edges are active. This renders a computationally solvable optimization problem and the solution is a graph with only a small number of active edges.

Based on local content in an image, the context-dependent representation divides the image into singular and smooth areas. Singular areas, consisting of edges or texture, are represented and processed differently from the smooth areas. The graphs of singular areas are constructed based on the persistence and sparsity of wavelet coefficients of the image. The persistence is imposed on the inter-scale edges so that the solution at one scale can be used to confine that in adjacent scales. Meanwhile, the sparsity is imposed on the intra-scale edges that preserve the edges in which end nodes have similar intensity. In contrast, a graph of a smooth area is in the image domain and has only sparse intra-scale edges.

The algorithm to derive the optimal graphs, called graphical ADMM, is based on the alternating direction method of multipliers (ADMM) method [15, 16]. It is an efficient and robust algorithm since it breaks a complicated problem into smaller pieces, each of which is easier to handle. In our case, the node update is separated from the edge update in the optimization. In addition, for wavelet graphs, graphical ADMM approximates the multi-scale optimization problem into a sequence of sub-problems; each can be efficiently solved by convex optimization methods.

The chapter is organized as follows. In Section 2, we present the models and the construction for the context-dependent graphs. In Section 3, we formulate the noise reduction problem as a graph optimization model and present the graphical ADMM method to derive optimal graphs. In Section 4, the image restoration problem is formulated as the proximal point method that reduces the problem into a sequence of noise reduction problems, each being solved by the method in Section 3. In Section 5, experimental results and the principal differences between our and the compared methods are also discussed. Section 6 contains concluding marks.


2. Context-dependent graphical models

An image is comprised of features of edges, texture, and smooth areas. A common approach to obtain a good image processing result is to treat different features with different approaches [17, 18]. Following this approach, an image is partitioned into two types of blocks. A block containing an edge point or texture is a singular block, while the others are smooth blocks. To keep the flow, we delay the partitioning method of an image, which is described in part A of Section 5, but this is not necessary to accurately partition an image to achieve the performance demonstrated in this chapter. The singular and smooth blocks were handled with different graph optimization approaches: a singular block is in the wavelet domain, while a smooth block is in the image domain. In the wavelet domain, a singular block is represented by several weighted graphs, one corresponding to an orientation. If the wavelet transform has three orientations, LH, HL, and HH, then one graph is for LH sub-bands, another for HL sub-bands, and the third for HH sub-bands. The graph for one orientation is constructed as follows.

Each wavelet coefficient is associated with a node. Edges are comprised of inter-scale and intra-scale edges. An inter-scale edge connecting nodes in adjacent scales can direct either from a coarse scale to a finer scale or vice versa. The inter-scale edges are built-in and data-independent; they are constructed based on the wavelet persistence. In contrast, an intra-scale edge connecting nodes of the same scale is un-directed, data-dependent, and determined based on the sparseness from the graph optimization algorithm. Regularizations have been imposed on inter-scale edges to preserve the persistence of wavelet coefficients across scales and on intra-scale edges to preserve the similarity of wavelet coefficients on nodes at the two ends of an edge.

2.1 Inter-scale edges

Since wavelets can characterize singularities, representing singularities with wavelets can facilitate the restoration of edges and texture in an image. The persistence property of wavelets means that the wavelet coefficients dependency and correlations across scales. Thus, inter-scale edges were constructed to link the wavelet coefficients of the same orientation and locations at adjacent scales. Moreover, the correlations of wavelet coefficients from a coarser scale to a finer scale are different from that from a finer scale to a coarser scale. There are two types of inter-scale edges—coarse-to-fine and fine-to-coarse. The coarse-to-fine inter-scale correlation is derived based on the statistical result by Simoncelli [19], who analyzed the correlation between the dyadic wavelet coefficients in a coarse scale to those at the same location and orientation at the immediate finer scale in a natural image. The coarse-to-fine inter-scale correlation of wavelet coefficient wpiat a coarse scale and wavelet coefficient wcat the immediate finer scale can be represented in terms of minus log-probability as


where k1is a parameter. Thus, given the wavelet coefficients wpiat the coarse scale, Eq. (3) gives the minus log probability of the wavelet coefficient wiat the same location and orientation at the immediate fine scale.

On the other hand, the fine-to-coarse inter-scale correlation is derived from the theoretical result of wavelet singularity, analyzed by Mallat and Hwang [20]. Let wiand wcibe wavelet coefficients corresponding to the same singularity at different scales. Then, wiand wcihave the same sign and the correlation, in terms of the ratio of the modulus of wavelet coefficients, from wciat a fine scale to wiat the immediate coarser scale can be expressed as


where αis the Lipschitz of the singularity. If the singularity is a step edge, then αis 0. The exponent α+12in Eq. (4) depends on how a wavelet is normalized. Here, the wavelet is normalized to have unit 2-norm.1Eq. (4) can also be expressed in terms of minus log-probability as


where k2is a parameter. If the type αof the singularity is known, given the wavelet coefficient at the finer scale, wci, Eq. (5) gives the minus log-probability of the wavelet coefficient wiat the coarse scale. Since step edges are the most salient features to be recovered from an image, in this chapter, we set αto 0.

2.2 Intra-scale edges

A coherent or similar structure can be used to leverage the quality of the restoration [2, 21]. This is the principle behind the success of BM3D and the example-based approach in image processing [22]. Many similarity metrics have been proposed to derive the coherent structure, such as the mutual information, the kernel functions, and the Pearson’s correlation coefficient. In this chapter, the Pearson’s correlation coefficient is modified for some technical concern to derive the intra-scale correlation of random variables Xand Y:


where μXσXand μYσYare the mean and the standard deviation of Xand Y, respectively, and q>0is the offset, introduced to avoid dXY=0in inequality constraints in Eq. (8). The value of dXYlies in q1+q, measuring the similarity of Xand Y. The smaller dXYis, the more the independence of Xand Yand the less likely Xand Yhave coherence structure. As shown in Figure 1, the coherence structure in an image is measured on all the coefficients of intra-scale edges in which endpoints take the same locations within a block in a sub-band.

Figure 1.

An illustrative example of how the metricdh,lis measured. There are 12 blocks in a sub-band. All the intra-scale edges between theh-th and thel-th nodes are collected. The coefficients on theh-nodes form the random variableXand those on thel-th nodes form the random variableY.dh,lis then measured based onEq. (6).

The intra-scale edges are determined based on the sparsity constraint aiming to preserve the edges in which end nodes have similar values. The number of the edges is determined by the parameter:


where dyhylis defined in Eq. (6) and obtained from the observation image, and xhand xlare the coefficients at the h-th and l-th nodes, respectively. If the observed values yhand ylare similar, the value of dyhylis large, then xhxlwould be small to satisfy the constraint. This preserves the intensities between xhand xh. Only the edges satisfying Eq. (7) are retained in the optimal graph. In the following, dh,lis used to simplify the notion dyhylin Eq. (7).

2.3 Graph construction

The aforementioned are integrated and summarized for our context-dependent representation of an image. An image is divided into blocks. Each block is classified as either a singular block or a smooth block. A singular block is then represented with the dyadic wavelet transform, where the scale is sampled following a geometrical sequence of the ratio of 2 and the spatial domain is not down-sampled. The dyadic wavelet transform of an image is comprised of four sub-bands, LL, LH, HL, and HH, with the last three being the orientation sub-bands. A singular block is associated with three graphs—one for each orientation sub-bands. Since smooth blocks can be well restored in the image domain, the wavelet transform is not applied to the blocks and a smooth block is associated with a graph in the image domain. Each graph, associated with a singular block or a smooth block, is constructed independently of other graphs. Figure 2 illustrates an example of graph representation for a block of four pixels.

Figure 2.

A block of four pixels can be a smooth block (top) or a singular block (bottom). A smooth block is processed in the image domain. A singular block is in the wavelet domain, where a multi-scale graph is associated with each orientation. The blue and green are built-in directed inter-scale edges and the red are un-directed intra-scale edges. The inter-scale edge connects nodes at the same locations and orientation but at scales next to each other. The green edge is coarse-to-fine, linking a node to its parent, while the blue edge is fine-to-coarse, linking a node to its child. The intra-scale edges are determined by graphical ADMM, which decomposes the node update and edge update in the optimization.


3. Optimal graphs for noise reduction

The noise reduction problem corresponds to problem (1), where His the identity matrix. Since each graph is solved independently of the other graphs, the following discussion is focused on one graph. A graph can be associated with a singular block or a smooth block.

3.1 Singular blocks

Let yiand xibe the wavelet coefficients associated to the i-th node in a sub-band of the observation image and the original image, respectively. Its parent node is denoted as piand child node is ci. Let zh,lbe the variable defined on the intra-scale edge that connects the h-th and l-th nodes, which are at the same scale and orientation. The optimal wavelet graph can be derived by solving


where r, σ, k1, and k2are non-negative parameters and the constraints are designed as explained under Eq. (7), where rcontrols the number of edges in the optimal graph. If a node is at the coarsest scale 2J, where Jis the number of decomposition of the wavelet transform, it does not have a parent node and the second term in the object of Eq. (8) is zero; the third term is set to zero for a node at the finest scale 21as it does not have a child node.

Problem (8) has a convenient matrix representation. Let x=xiand z=zh,lbe the vectors of variables on nodes and intra-scale edges, respectively. Then, the linear constraints between zh,land xhand xlin Eq. (8) can be expressed as


where Ais a matrix with elements either 1, 1, or 0. Let A=AxAz. If follows that


Each row of Axhas one element which value is 1and another element which value is 1and the rest of value 0. Meanwhile, each row of has one element of values and the rest of value . Let λ=λh,l02 and μbe the vectors of Lagrangian variables associated to inequality constraints and the equality constraints in Eq. (8), respectively. The augmented Lagrangian of Eq. (8) is


where 1is a vector with all members being 1; and ρ>0are fixed parameters.

The ADMM algorithm intends to blend the decomposability of dual ascent with the superior convergence properties of the method of multipliers. Here, ADMM is used to derive the optimal graph by separating the node and edge update. Graphical ADMM derives the saddle points of


by iterating the following updates:


where Qand P0are the orthogonal projections to satisfy constraints dj,lzh,l1rand λh,l0, respectively; and ε>0is the stepping size for the dual ascent of Lagrangian variables λ. The first and second updates in Eq. (13) update the node variables and edge variables, respectively. The update of the dual variables λis derived based on the necessary conditions at an optimum of Eq. (8) that


The third update in Eq. (13) has the following interpretation. If rdh,lzh,lk+1>0, then λh,lkwill decrease and keep its value to be non-negative by P. The value of λh,lcan be repeatedly decreased by increasing the iteration number kuntil either λh,l=0or rdh,lzh,lk+1=0, where the edge is active, and the optimal conditions (14) satisfy. At optimum, either the Lagrangian associated with an edge is zero or the constraint on the edge is active. Only the active edges are retained in the graph. Since the number of active edges is sparse, the edges in the optimal graph are sparse. The solutions for the updates rules for primal variables are derived in Sections 3.1 and 3.2, respectively.

3.2 Smooth blocks

A smooth block is processed in the image domain, where each pixel is associated with a node in a graph. Problem (8) becomes finding the optimal graph by solving


The optimal graph can be derived by a method similar to that for a singular block.

3.3 Update edges

The update rule for edge variables zis to solve, with fixed x,λ,and μ,


If only the terms in Eq. (3) relevant to the optimization variables, zh,l, are concerned, Eq. (16) becomes


because dh,lis non-zero as defined in Eq. (6). Equation (17) indicates that each edge variable can be updated independently of the others.

We first solve Eq. (17) without the constraint and obtain the solution uh,lof


by soft-thresholding with


It is then followed by orthogonally projecting uh,lto satisfy the constraint by solving


Eq. (20) can be solved by a sequence of soft-thresholding operations. The algorithm is sketched as follows. First, we check whether uh,lrdh,l. If it is, uh,lis the solution. Otherwise, we begin with a small γand solve


The solution is the soft-thresholding as


If zh,l+rdh,l, zh,lis updated to zh,l+, the algorithm stops. Otherwise, γis increased and Eq. (22) is solved again. Since increasing γdecreases zh,l+, this algorithm always stops and updates the edge variable zh,lto meet the constraint.

The complexity to update edge variables is analyzed. If the number of pixels of a block is nand if the dyadic wavelet transform takes Jscales, then the number of edge constraints on a singular block is OJn2and the number of edge constraints on a smooth block is On2. Let K1be the maximum number of iterations to derive the solution for Eq. (20) for all graphs. The complexity of one edge update is OK1n2JF+3JWJ, where 3is the number of orientations, JFand JWare numbers of singular blocks and smooth blocks, respectively.

3.4 Update nodes

The node update for a singular block is more complicated than that for a smooth block because a graph for a singular block has a multi-scale structure, where adjacent scales are linked by inter-scale edges.

3.5 Singular blocks

To update the nodes xin a singular block is to solve the augmented Lagrangian function (3) via


for given z,λ, and μ. This is not a convex problem because the second term in Eq. (3) is non-convex.

Our approach is to decompose the problem based on the scale parameter into a sequence of sub-problems. Each scale is associated with two convex sub-problems: one is a coarse-to-fine sub-problem and the other is a fine-to-coarse sub-problem. The coarse-to-fine sub-problem assumes the parent nodes at scale 2s+1were updated earlier, while the fine-to-coarse sub-problem assumes the child nodes at scale 2s+1were updated earlier. Let kbe the current iteration number. The course-to-fine sub-problem updates the nodes at scale 2sby minimizing3


On the other hand, the fine-to-coarse sub-problem updates the nodes at scale by minimizing4


The node update problem (23) can then be approximated by repeatedly solving the coarse-to-fine iteration followed by the fine-to-coarse iteration. The coarse-to-fine iteration solves a sequence of the coarse-to-fine sub-problems beginning at the coarsest scale. In contrast, the fine-to-coarse iteration solves a sequence of the fine-to-coarse sub-problems beginning at the finest scale.

Problems (24) and (25) can be efficiently solved. The objectives in the sub-problems are strictly convex functions because their Hessian matrices are positive definite (as can be observed from the inverse matrix of Eqs. (26) and (27)) and, thus, the optimal solution of each is unique. The closed-form solutions of the sub-problems can be derived as follows.

For convenience, we omit all the superscript index in Eqs. (24) and (25) and let Axsand Azsdenote the sub-matrices of Axand Ax, respectively. Axsand Azsretain only the rows and columns in Axand Axcorresponding to the nodes and edges at scale 2s, respectively. We also let xs, zs, and μsdenote the vectors of nodes, edges, and Lagrangian variables at scale 2s. The closed-form solution of xsof the coarse-to-fine sub-problem is


where Cs+1is a diagonal matrix which diagonal element at iiis k12xpiand 2Jis the coarsest scale. On the other hand, the closed-form solution of xsfor the fine-to-coarse sub-problem is


The complexity of the matrix inversion in Eqs. (26) and (27) is low since Axsis a sparse matrix and each row of Axshas at most one 1and one 1and the rest are zero. The complexity of the sparse matrix inversion in Matlab is proportional to the number of non-zero elements in the matrix. Thus, one iteration of either coarse-to-fine or fine-to-coarse of a graph takes the complexity OJn, where Jis the number of decomposition and nis the number of pixels at a scale.

3.6 Smooth blocks

The node update for a smooth block can be analytically derived from the problem (15) at the condition that zh,lis given. If the superscript index is omitted, the closed-form solution is


where Axand Azare defined in Eq. (10). The complexity of the inversion of the sparse matrix 1σ2I+ρAxTAxis propositional to the non-zero elements in the matrix, which is On, where nis number of pixels in a block.

If an image has JWsingular blocks and JFsmooth blocks, and if Jis the number of wavelet decompositions, the total complexity of node updates of the image is On3K2JJW+JF, where K2is the maximum number of iterations of coarse-to-fine and fine-to-coarse node update for a singular block.

Graphical ADMM consists of a sequence of updating the primal and dual variables and the complexity of the algorithm is dominated by the primal variable updates. Our analysis of one iteration of Eq. (13) for node updates and edge updates indicates that the costs are On3K2JJW+JFand OK1n23JWJ+JF, respectively, where nis the number of pixels in a block.


4. Optimal graphs for image restoration

There are various image restoration methods [23]. Here, we use the proximal approach proposed in [24] and [12]. The method smartly reduces the image restoration problem into a sequence of noise reduction problems. Since graphical ADMM for noise reduction is efficient, it can be adopted to derive the optimal graphs for image restoration. Like for noise reduction, a graph is handled independently of the other graphs. The following discussion is focused on deriving the optimal graph for a block.

Let hxbe the objective function


with a known blur kernel H; x0is the vector of the current restored image. The proximity function is defined as


The parameter βis chosen so that dHxx0is strictly convex with respect to x. This implies that its Hessian βIHTHis a positive definite matrix, which can be achieved by choosing β>λmaxHTH(the maximal eigenvalue of the matrix HTH). The proximal objective is defined as


Simplifying the above objective, we have the following simpler form by removing Hxfrom the proximal objective as


where Kcontains terms unrelated to x. Since x0,q,H,yare given, the proximal objective can be regarded as a noise reduction problem with the observation vector, x0+1qHTyHx0. Thus, h˜xx0can be the first term in noise reduction problem (8) and the algorithm for noise reduction can be used to derive the optimal graph via separating the node and edge updates following Eq. (13) and procedures in Section 3.


5. Experiments and comparisons

We consider several image denoising and deblurring scenarios used as the benchmarks in state-of-the-art algorithms for performance evaluations and comparisons. The setting of experiments is given as follows. The experiments were conducted on images in Sets I and II in Figure 3. Set I contains six gray-scaled natural images, Einstein, Boat, Barbara, Lena, Cameraman, and House. The size of each image is 512×512or 256×256, downloaded from the USC-SIPI image database [25]; and Set II contains six gray-scaled textures. Some of them were taken from the Brodatz texture set. Through all experiments, each image is divided into 16 equal-sized blocks. A singular block is decomposed into four scales dyadic wavelet transform with the CDF 9/7 wavelet filters. Since the CDF 9/7 filters are close to orthogonal wavelet filters, the noise variance at any sub-band can be set to σ2, the variance of noise in the image domain [7].

Figure 3.

The images in set I (first row) and set II (second row). Set I contains six gray-scaled natural images and set II contains six gray-scaled textures.

5.1 Noise reduction performance

Our noise reduction performance was compared against that of BM3D, WBN, and DRUNet. The perceptual quality of the methods is shown in Figure 4. The Lena image of BM3D over-smooths the highlighted area of hat, which is rich in edges and textures. Similarly, textures in the highlighted area of hat in DRUNet are smooth. The image of WBN, on the contrary, under-smooths the highlighted smooth area around the chin and shoulder of Lena. These artifacts have been amended by graphical ADMM, as shown in Figure 4f.

Figure 4.

Comparisons of the denoised Lena images derived by BM3D, WBN, DRUNet, and graphical ADMM. The noise standard deviation is set atσ=25: (a) the original512×512Lena image; (b) the noised image; (c) the result of BM3D; (d) the result of WBN; (e) the result of DRUNet; and (f) the result of graphical ADMM. Graphical ADMM preserves both the smooth and edged areas in the original image, as shown in the highlighted areas.

The quantity comparisons, measured by the peak-signal-to-noise ratio (PSNR), of Set I and Set II are shown in Tables 1 and 2, respectively. The testing environments were images contaminated with the noise of variances, σ2. As shown, the deep learning-based method (DRUNet) achieves the highest score in almost all environments. However, in the optimization-based methods (BM3D, WBN, proposed), graphical ADMM achieves unanimously the highest score in all environments.


Table 1.

Comparisons of the PSNRs of the noise reduction methods on noisy images with the noise of standard deviation σin set I.


Table 2.

Comparisons of the PSNRs of the noise reduction methods on set II texture images.

5.2 Image restoration performance

Table 3 presents five-point spread functions (PSFs) used for image restoration in literature [11]. Each PSF was normalized to have unit 1-norm before it was used to blur an image. The performance was compared with the state-of-the-art methods, IDD-BM3D and DPIR. To have fair comparisons, both methods used the same initial images in each experiment. The visual quality of the restored images is shown in Figure 5 and the blue and red boxes are magnifications of the highlighted areas in the image. Compared with the original images, the overall perceptual quality of the images of IDD-BM3D and DPIR appear over-smoothed, whereas graphical ADMM preserves more image details, leading to better perceptual quality. Graphical ADMM can preserve more details because it uses the multi-scale approach in treating the texture and edge regions. The wavelet persistence property allows information at coarse scales to pass to fine scales and vice versa. As a result, graphical ADMM yields shaper results in recovering singular points in images.

Blur KernelFormulationsize
h4Gaussian (σ=1.6)25×25
h5Gaussian (σ=0.4)25×25

Table 3.

Blur kernels for experiments. The dxand dyin h1are, respectively, the horizontal and vertical distances of a pixel to the center of the blur kernel.

Figure 5.

Comparisons of the deblurred images. The blue and red boxes are the magnified areas in the image. (a) the original512×512boat image; (b) the blurred image with blur kernelh4; (c) the image of IDD-BM3D; (d) the image of DPIR; and (e) the image of graphical ADMM. The overall perceptual quality of our image is better since that of IDD-BM3D and DPIR are over-smoothed.

The quantity comparison is shown in Figure 6, where the performance improvement of graphical ADMM over IDD-BM3D was measured by the ISNR (increased signal-to-noise ratio) [26]. The ISNR quantitatively assesses the restored images with known ground truths. Let y, x, and x0be the vector representations of the observation, the restored image, and the ground truth, respectively; the ISNR is defined as

Figure 6.

Average and standard deviation of the ISNR gain of DPIR over IDD-BM3D and graphical ADMM over IDD-BM3D. Each image is blurred and then added to white noise of standard deviation indicated by the noise level. The image was then deblurred. An ISNR gain was calculated from the de-blurred images. The circled point and bar of a measurement at a noise level are the average and standard deviation, respectively, of thirty ISNR gains of natural images from set I ((a) DPIR over IDD-BM3D and (b) graphical ADMM over IDD-BM3D) and texture images from set II ((c) DPIR over IDD-BM3D and (d) graphical ADMM over IDD-BM3D). As shown, the curves of ISNR gain increase steadily and progressively when the noise level increases.


The higher the ISNR value of a restored image, the better the restoration quality of the image. The ISNR gain of graphical ADMM over that of IDD-BM3D is defined as


and the ISNR gain of DPIR over that of IDD-BM3D is defined as


Figure 6a and c show the ISNR gain of DPIR over IDD-BM3D in Set I and Set II, respectively. Figure 6b and d show the ISNR gain of graphical ADMM over IDD-BM3D in Set I and Set II, respectively. Let us take Figure 6b as an example. At a noise level, each image in Set I was first blurred by a kernel in Table 3. The result was added to white noise to obtain a noisy blurred image. This procedure generated thirty noisy blurred images since Set I contains six images and Table 3 has five blur kernels. Each noisy blurred image was deblurred. The ISNR gain of the image obtained by graphical ADMM and that by IDD-BM3D was calculated. The thirty ISNR gains were then used to calculate the mean and standard derivation, as shown in Figure 6. The mean ISNR gain of graphical ADMM increases steadily and progressively over IDD-BM3D, as the noise level increases.

5.3 Discussions

DRUNet and DPIR are deep learning methods and the training data with a noise level of σranges from 0to 50. In the experiments, they have the best performance in quantity comparisons but graphical ADMM is the best in visual quality. For learning-based methods, the training data is important and related to the performance. The inferred results are data-driven and not interpretable. If the training data is less or the distribution of the testing data is not similar to training, the performance will be worse. If the artifacts occurred in the results, we do not know how it happened because the network is just composed of many coefficients trained from the training data. At the same time, the time cost for training is very high. These are all the drawbacks of learning-based methods. However, the optimization-based methods are not limited to the training data. The results are derived from the objective function and interpretable. In addition, they are more stable in practical applications. So, there is a trade-off between learning and optimization-based methods.

For the optimization-based methods, the experiments have demonstrated the advantages of graphical ADMM in both the noise reduction and image restoration tasks over the compared methods. Recall that BM3D and IDD-BM3D adopt the image-dependent tight frame representations. IDD-BM3D also combines the analytic and synthetic optimization methods by de-coupling the noise reduction problem and the image restoration problem. This yields a game-theoretical approach that two formulations are used to minimize a single objective function. The solution adopted by IDD-BM3D is a Nash equilibrium point. The WBN represents an image as a multi-scale probabilistic DAG and adopts the belief propagation to derive the MAP solution.

The advantages of graphical ADMM lie in the context-dependent decompositions of an image horizontally in space and vertically along the scales in handling the image details. The spatial decomposition allows our method to overcome the cons of under-smoothing the smooth areas in WBN and keeps the pros of WBN that preserves sharp edges. Meanwhile, graphical ADMM is much more efficient than the time-consuming belief propagation adopted in WBN.

The mixture of data-dependent and data-independent edges in wavelet graph construction is a significant feature of our method. The intra-scale edges are determined by a data-dependent adaptive process, which imposes sparseness by keeping the edges which end nodes have similar coefficients in the optimal graph. The inter-scale edges are data-independent, built-in to leverage the wavelet persistence property. The inter-scale edges, passing information of singularities from finer scales to coarser scales and vice versa, can preserve more texture and edges in original images. This distinguishes our algorithm from BM3D and IDD-BM3D, which encode structure in atoms of a dictionary and select a few atoms for image representation.


6. Conclusions

We present a novel approach by combining spatial decomposition, vertical (multi-scale) decomposition, and ADMM optimization in a graphical framework for image noise reduction and restoration tasks. The graphical ADMM method has demonstrated that its results are superior to those of state-of-the-art algorithms. We also demonstrated that mixing data-dependent and data-independent structures in a graph representation can leverage the sparseness and persistence of a wavelet representation. Rather than adopting a combinatorial approach to derive an optimal graph, we showed that the graph can be derived by a numerically tractable optimization approach. In addition, we showed that the optimization problem is well coupled with our graph representation, and can be decomposed into a sequence of convex sub-problems, with each having an efficient closed-form solution. This opens a new perspective of combining a mixture of data-adaptive and data-independent structures, hierarchical decomposition, and optimization algorithms in modeling, representing, and solving more image processing tasks.


  1. 1. Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 1992;60(1–4):259-268. DOI: 10.1016/0167-2789(92)90242-F
  2. 2. Buades A, Coll B, Morel J. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation. 2005;4(2):490-530. DOI: 10.1137/040616024
  3. 3. Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing. 2006;15(12):3736-3745. DOI: 10.1109/TIP.2006.881969
  4. 4. Chatterjee P, Milanfar P. Is denoising dead? IEEE Transactions on Image Processing. 2010;19(4):895-911. DOI: 10.1109/TIP.2009.2037087
  5. 5. Li J, Shen Z, Yin R, Zhang X. A reweighted l2 method for image restoration with poisson and mixed poisson-gaussian noise. Inverse Problem and Imaging. 2015;9(3):875-894. DOI: 10.3934/ipi.2015.9.875
  6. 6. Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing. 2007;16(8):2080-2095. DOI: 10.1109/TIP.2007.901238
  7. 7. Ho J, Hwang W. Wavelet Bayesian network image denoising. IEEE Transactions on Image Processing. 2013;22(4):1277-1290. DOI: 10.1109/TIP.2012.2220150
  8. 8. Zhang K, Li Y, Zuo W, Zhang L, Gool L, Timofte R. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. DOI: 10.1109/TPAMI.2021.3088914
  9. 9. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. DOI: 10.1007/978-3-319-24574-4_28
  10. 10. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. DOI: 10.1109/CVPR.2016.90
  11. 11. Danielyan A, Katkovnik V, Egiazarian K. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing. 2012;21(4):1715-1728. DOI: 10.1109/TIP.2011.2176954
  12. 12. Zibulevsky M, Elad M. L1-L2 optimization in signal and image processing. IEEE Signal Processing Magazine. 2010;27(3):76-88. DOI: 10.1109/MSP.2010.936023
  13. 13. Heckerman D. A Tutorial on learning bayesian networks. In: Innovations in Bayesian Networks. Berlin/Heidelberg: Springer; 2008. pp. 33-82. DOI: 10.1007/978-3-540-85066-3_3
  14. 14. Chickering D, Heckerman D, Meek C. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence. 1997:80-89. Available from:
  15. 15. Cai J, Osher S, Shen Z. Split Bregman methods and frame based image restoration. Multiscale Modeling & Simulation. 2009;8(2):337-369. DOI: 10.1137/090753504
  16. 16. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning. 2011;3(1):1-122. DOI: 10.1561/2200000016
  17. 17. Hoiem D, Efros A, Hebert M. Geometric context from a single image. IEEE International Conference on Computer Vision. 2005;1:654-661. DOI: 10.1109/ICCV.2005.107
  18. 18. Ji H, Luo Y, Shen Z. Image recovery via geometrically structured approximation. Applied and Computational Harmonic Analysis. 2016;41(1):75-93. DOI: 10.1016/j.acha.2015.08.012
  19. 19. Simoncelli E. Bayesian denoising of visual images in the wavelet domain. Bayesian Inference in Wavelet Based Models. 1999;141:291-308. DOI: 10.1007/978-1-4612-0567-8_18
  20. 20. Mallat S, Hwang W. Singularity detection and processing with wavelets. IEEE Transactions on Information Processing. 1992;38(2):617-643. DOI: 10.1109/18.119727
  21. 21. Milanfar P. A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine. 2013;30(1):106-128. DOI: 10.1109/MSP.2011.2179329
  22. 22. Sreedevi P, Hwang W, Lei S. An examplar-based approach for texture compaction synthesis and retrieval. IEEE Transactions on Image Processing. 2010;19(5):1307-1318. DOI: 10.1109/TIP.2009.2039665
  23. 23. Mairal J, Elad M, Sapiro G. Sparse representation for color image restoration. IEEE Transactions on Image Processing. 2008;17(1):53-69. DOI: 10.1109/TIP.2007.911828
  24. 24. Daubechies I, Defrise M, De-Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics. 2004;57(11):1413-1457. DOI: 10.1002/cpa.20042
  25. 25. The USC-SIPI Image Database, Signal and Image Processing Institute, University of Southern California. Available from:
  26. 26. Hanif M, Seghouane A. Blind image deblurring using non-negative sparse approximation. IEEE International Conference on Image Processing. 2014. DOI: 10.1109/ICIP.2014.7025821


  • If the wavelet is normalized to have unit 1-norm, then the exponent of Eq. (4) should be α.
  • Let λ=λh,l. Then, λ≥0 if and only if λh,l≥0 for all h and l.
  • The second term at below is zero, when 2s is the coarsest scale.
  • The second term below is zero when nodes are at the finest scale.

Written By

Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang

Reviewed: January 14th, 2022 Published: February 27th, 2022