Open access peer-reviewed chapter

Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM Optimization

Written By

Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang

Reviewed: 14 January 2022 Published: 27 February 2022

DOI: 10.5772/intechopen.102681

From the Edited Volume

Recent Advances in Wavelet Transforms and Their Applications

Edited by Francisco Bulnes

Chapter metrics overview

213 Chapter Downloads

View Full Metrics

Abstract

We represent the image noise reduction and restoration problems as context-dependent graphs and propose algorithms to derive the optimal graphs by the alternating direction method of multipliers (ADMM) method. An image is spatially decomposed into smooth regions and singular regions, consisting of edges and textures. The graph representing a smooth region is defined in the image domain, while that representing a singular region is defined in the wavelet domain. The optimal graphs are formulated as the solutions of constrained optimization problems over sparse graphs, where the sparseness is imposed on the edges. The graphs on the wavelet domain are solved in a hierarchical layer structure. The convergence and complexity of the algorithms have been studied. Simulation experiments demonstrate that the results of our algorithms are superior to the state-of-the-art algorithms for image noise reduction and restoration.

Keywords

  • image restoration
  • image denoising
  • graph
  • ADMM
  • wavelet

1. Introduction

We consider the inverse problem of deriving the original image xRN from an observation yRN, expressed as

y=Hx+n,E1

where H is an N×N matrix and nN0N×1σIN×N is a vector of independent and identically distributed (i.i.d.) Gaussian random variables with standard deviation σ. We further assume that the point spread function H is known. By imposing prior information on the desired image, given as

x̂=argminx1σ2yHx2+penaltyx,E2

where the first term is the data fidelity term for the Gaussian observation model and the second term is the regularization term, measuring the penalty of a solution that deviated away from the prior knowledge of the desired image. The modeling of the desired image is at the core of the approach [1, 2, 3, 4]. The primary challenge of solving the problem is to recover the local high-frequency information of edges and texture in the original images that are not present in the observation.

If H is the identity matrix, problem (1) is called the noise reduction problem. The solutions vary with what type of noise is contaminated in the observation [5]. If the noise is white Gaussian noise, the state-of-the-art algorithms for the problem are BM3D [6], WBN [7], and DRUNet [8]. BM3D utilizes the tight frame representation of an image, where atoms of the frame are derived from image patches. WBN is a graphical probabilistic model of a weighted directed acyclic graph (DAG) in the wavelet domain. Different from BM3D and WBN, DRUNet is a deep learning method. It is a flexible and powerful deep CNN denoiser and the architecture is the combination of U-Net [9] and ResNet [10]. It not only outperforms the state-of-the-art deep Gaussian denoising models but also is suitable to solve plug-and-play image restoration.

If H is a blur singular matrix, problem (1) is called the image restoration problem. In the optimization-based method, the best image restoration performance both subjectively and objectively was derived from the algorithm IDD-BM3D [11]. It utilizes sparse synthetic and analytic models and de-couples the problem into blur inverse and noise reduction sub-problems, each of which is solved by a variational optimization approach. In deep learning, DPIR [8] replaces the denoising sub-problem of model-based optimization with a learning-based CNN denoiser prior which is DRUNet. By iteratively solving the data sub-problem and a prior sub-problem to restore the image.

In this chapter, we also present a restoration algorithm that combines the noise reduction algorithm with the proximal point method [12]. The primary technical contributions of our methods are the context-dependent graphical representations and the algorithms to derive the optimal graphs of each representation. Finding the optimal graph in a combinatorial way is extremely difficult and likely an NP-hard problem [13, 14]. Unlike the combinatorial approach, we impose constraints on edges and include edges in the optimal graph only when the constraints on the edges are active. This renders a computationally solvable optimization problem and the solution is a graph with only a small number of active edges.

Based on local content in an image, the context-dependent representation divides the image into singular and smooth areas. Singular areas, consisting of edges or texture, are represented and processed differently from the smooth areas. The graphs of singular areas are constructed based on the persistence and sparsity of wavelet coefficients of the image. The persistence is imposed on the inter-scale edges so that the solution at one scale can be used to confine that in adjacent scales. Meanwhile, the sparsity is imposed on the intra-scale edges that preserve the edges in which end nodes have similar intensity. In contrast, a graph of a smooth area is in the image domain and has only sparse intra-scale edges.

The algorithm to derive the optimal graphs, called graphical ADMM, is based on the alternating direction method of multipliers (ADMM) method [15, 16]. It is an efficient and robust algorithm since it breaks a complicated problem into smaller pieces, each of which is easier to handle. In our case, the node update is separated from the edge update in the optimization. In addition, for wavelet graphs, graphical ADMM approximates the multi-scale optimization problem into a sequence of sub-problems; each can be efficiently solved by convex optimization methods.

The chapter is organized as follows. In Section 2, we present the models and the construction for the context-dependent graphs. In Section 3, we formulate the noise reduction problem as a graph optimization model and present the graphical ADMM method to derive optimal graphs. In Section 4, the image restoration problem is formulated as the proximal point method that reduces the problem into a sequence of noise reduction problems, each being solved by the method in Section 3. In Section 5, experimental results and the principal differences between our and the compared methods are also discussed. Section 6 contains concluding marks.

Advertisement

2. Context-dependent graphical models

An image is comprised of features of edges, texture, and smooth areas. A common approach to obtain a good image processing result is to treat different features with different approaches [17, 18]. Following this approach, an image is partitioned into two types of blocks. A block containing an edge point or texture is a singular block, while the others are smooth blocks. To keep the flow, we delay the partitioning method of an image, which is described in part A of Section 5, but this is not necessary to accurately partition an image to achieve the performance demonstrated in this chapter. The singular and smooth blocks were handled with different graph optimization approaches: a singular block is in the wavelet domain, while a smooth block is in the image domain. In the wavelet domain, a singular block is represented by several weighted graphs, one corresponding to an orientation. If the wavelet transform has three orientations, LH, HL, and HH, then one graph is for LH sub-bands, another for HL sub-bands, and the third for HH sub-bands. The graph for one orientation is constructed as follows.

Each wavelet coefficient is associated with a node. Edges are comprised of inter-scale and intra-scale edges. An inter-scale edge connecting nodes in adjacent scales can direct either from a coarse scale to a finer scale or vice versa. The inter-scale edges are built-in and data-independent; they are constructed based on the wavelet persistence. In contrast, an intra-scale edge connecting nodes of the same scale is un-directed, data-dependent, and determined based on the sparseness from the graph optimization algorithm. Regularizations have been imposed on inter-scale edges to preserve the persistence of wavelet coefficients across scales and on intra-scale edges to preserve the similarity of wavelet coefficients on nodes at the two ends of an edge.

2.1 Inter-scale edges

Since wavelets can characterize singularities, representing singularities with wavelets can facilitate the restoration of edges and texture in an image. The persistence property of wavelets means that the wavelet coefficients dependency and correlations across scales. Thus, inter-scale edges were constructed to link the wavelet coefficients of the same orientation and locations at adjacent scales. Moreover, the correlations of wavelet coefficients from a coarser scale to a finer scale are different from that from a finer scale to a coarser scale. There are two types of inter-scale edges—coarse-to-fine and fine-to-coarse. The coarse-to-fine inter-scale correlation is derived based on the statistical result by Simoncelli [19], who analyzed the correlation between the dyadic wavelet coefficients in a coarse scale to those at the same location and orientation at the immediate finer scale in a natural image. The coarse-to-fine inter-scale correlation of wavelet coefficient wpi at a coarse scale and wavelet coefficient wc at the immediate finer scale can be represented in terms of minus log-probability as

k1wi2wpi2,E3

where k1 is a parameter. Thus, given the wavelet coefficients wpi at the coarse scale, Eq. (3) gives the minus log probability of the wavelet coefficient wi at the same location and orientation at the immediate fine scale.

On the other hand, the fine-to-coarse inter-scale correlation is derived from the theoretical result of wavelet singularity, analyzed by Mallat and Hwang [20]. Let wi and wci be wavelet coefficients corresponding to the same singularity at different scales. Then, wi and wci have the same sign and the correlation, in terms of the ratio of the modulus of wavelet coefficients, from wci at a fine scale to wi at the immediate coarser scale can be expressed as

wiwci=wiwci=2α+12,E4

where α is the Lipschitz of the singularity. If the singularity is a step edge, then α is 0. The exponent α+12 in Eq. (4) depends on how a wavelet is normalized. Here, the wavelet is normalized to have unit 2-norm.1Eq. (4) can also be expressed in terms of minus log-probability as

k2wi2α+12wci2,E5

where k2 is a parameter. If the type α of the singularity is known, given the wavelet coefficient at the finer scale, wci, Eq. (5) gives the minus log-probability of the wavelet coefficient wi at the coarse scale. Since step edges are the most salient features to be recovered from an image, in this chapter, we set α to 0.

2.2 Intra-scale edges

A coherent or similar structure can be used to leverage the quality of the restoration [2, 21]. This is the principle behind the success of BM3D and the example-based approach in image processing [22]. Many similarity metrics have been proposed to derive the coherent structure, such as the mutual information, the kernel functions, and the Pearson’s correlation coefficient. In this chapter, the Pearson’s correlation coefficient is modified for some technical concern to derive the intra-scale correlation of random variables X and Y:

dXY=max0EXμXYμYσXσY+q,E6

where μXσX and μYσY are the mean and the standard deviation of X and Y, respectively, and q>0 is the offset, introduced to avoid dXY=0 in inequality constraints in Eq. (8). The value of dXY lies in q1+q, measuring the similarity of X and Y. The smaller dXY is, the more the independence of X and Y and the less likely X and Y have coherence structure. As shown in Figure 1, the coherence structure in an image is measured on all the coefficients of intra-scale edges in which endpoints take the same locations within a block in a sub-band.

Figure 1.

An illustrative example of how the metric dh,l is measured. There are 12 blocks in a sub-band. All the intra-scale edges between the h-th and the l-th nodes are collected. The coefficients on the h-nodes form the random variable X and those on the l-th nodes form the random variable Y. dh,l is then measured based on Eq. (6).

The intra-scale edges are determined based on the sparsity constraint aiming to preserve the edges in which end nodes have similar values. The number of the edges is determined by the parameter:

dyhylxhxlr,E7

where dyhyl is defined in Eq. (6) and obtained from the observation image, and xh and xl are the coefficients at the h-th and l-th nodes, respectively. If the observed values yh and yl are similar, the value of dyhyl is large, then xhxl would be small to satisfy the constraint. This preserves the intensities between xh and xh. Only the edges satisfying Eq. (7) are retained in the optimal graph. In the following, dh,l is used to simplify the notion dyhyl in Eq. (7).

2.3 Graph construction

The aforementioned are integrated and summarized for our context-dependent representation of an image. An image is divided into blocks. Each block is classified as either a singular block or a smooth block. A singular block is then represented with the dyadic wavelet transform, where the scale is sampled following a geometrical sequence of the ratio of 2 and the spatial domain is not down-sampled. The dyadic wavelet transform of an image is comprised of four sub-bands, LL, LH, HL, and HH, with the last three being the orientation sub-bands. A singular block is associated with three graphs—one for each orientation sub-bands. Since smooth blocks can be well restored in the image domain, the wavelet transform is not applied to the blocks and a smooth block is associated with a graph in the image domain. Each graph, associated with a singular block or a smooth block, is constructed independently of other graphs. Figure 2 illustrates an example of graph representation for a block of four pixels.

Figure 2.

A block of four pixels can be a smooth block (top) or a singular block (bottom). A smooth block is processed in the image domain. A singular block is in the wavelet domain, where a multi-scale graph is associated with each orientation. The blue and green are built-in directed inter-scale edges and the red are un-directed intra-scale edges. The inter-scale edge connects nodes at the same locations and orientation but at scales next to each other. The green edge is coarse-to-fine, linking a node to its parent, while the blue edge is fine-to-coarse, linking a node to its child. The intra-scale edges are determined by graphical ADMM, which decomposes the node update and edge update in the optimization.

Advertisement

3. Optimal graphs for noise reduction

The noise reduction problem corresponds to problem (1), where H is the identity matrix. Since each graph is solved independently of the other graphs, the following discussion is focused on one graph. A graph can be associated with a singular block or a smooth block.

3.1 Singular blocks

Let yi and xi be the wavelet coefficients associated to the i-th node in a sub-band of the observation image and the original image, respectively. Its parent node is denoted as pi and child node is ci. Let zh,l be the variable defined on the intra-scale edge that connects the h-th and l-th nodes, which are at the same scale and orientation. The optimal wavelet graph can be derived by solving

minxi,zk,lyixi22σ2+k1xi2xpi2+k2xi2xci2dh,lzh,lr,zh,l=xhxl,E8

where r, σ, k1, and k2 are non-negative parameters and the constraints are designed as explained under Eq. (7), where r controls the number of edges in the optimal graph. If a node is at the coarsest scale 2J, where J is the number of decomposition of the wavelet transform, it does not have a parent node and the second term in the object of Eq. (8) is zero; the third term is set to zero for a node at the finest scale 21 as it does not have a child node.

Problem (8) has a convenient matrix representation. Let x=xi and z=zh,l be the vectors of variables on nodes and intra-scale edges, respectively. Then, the linear constraints between zh,l and xh and xl in Eq. (8) can be expressed as

AxTzTT=0,E9

where A is a matrix with elements either 1, 1, or 0. Let A=AxAz. If follows that

AxTzTT=Axx+Azz.E10

Each row of Ax has one element which value is 1 and another element which value is 1 and the rest of value 0. Meanwhile, each row of has one element of values and the rest of value . Let λ=λh,l02 and μ be the vectors of Lagrangian variables associated to inequality constraints and the equality constraints in Eq. (8), respectively. The augmented Lagrangian of Eq. (8) is

Lρxzλμ=iyixi22σ2+k1xi2xpi2+k2xi2xci2+μTAxx+Azz+ρ2Axx+Azz2+h,lλh,ldh,lzh,lr1Tλ,E11

where 1 is a vector with all members being 1; and ρ>0 are fixed parameters.

The ADMM algorithm intends to blend the decomposability of dual ascent with the superior convergence properties of the method of multipliers. Here, ADMM is used to derive the optimal graph by separating the node and edge update. Graphical ADMM derives the saddle points of

maxλ0,μminx,zLρxzλμE12

by iterating the following updates:

xk+1=argminxLρxzkλkμk;zk+1=QargminzLρxk+1zλkμk;λh,lk+1=P0λh,lkεrdh,lzh,lk+1;μk+1=μk+ρAxxk+1+Azzk+1;E13

where Q and P0 are the orthogonal projections to satisfy constraints dj,lzh,l1r and λh,l0, respectively; and ε>0 is the stepping size for the dual ascent of Lagrangian variables λ. The first and second updates in Eq. (13) update the node variables and edge variables, respectively. The update of the dual variables λ is derived based on the necessary conditions at an optimum of Eq. (8) that

λh,l0;λh,lrdh,lzh,lk+1=0.E14

The third update in Eq. (13) has the following interpretation. If rdh,lzh,lk+1>0, then λh,lk will decrease and keep its value to be non-negative by P. The value of λh,l can be repeatedly decreased by increasing the iteration number k until either λh,l=0 or rdh,lzh,lk+1=0, where the edge is active, and the optimal conditions (14) satisfy. At optimum, either the Lagrangian associated with an edge is zero or the constraint on the edge is active. Only the active edges are retained in the graph. Since the number of active edges is sparse, the edges in the optimal graph are sparse. The solutions for the updates rules for primal variables are derived in Sections 3.1 and 3.2, respectively.

3.2 Smooth blocks

A smooth block is processed in the image domain, where each pixel is associated with a node in a graph. Problem (8) becomes finding the optimal graph by solving

minxi,zh,liyixi22σ2dh,lzh,lr,zh,l=xhxl.E15

The optimal graph can be derived by a method similar to that for a singular block.

3.3 Update edges

The update rule for edge variables z is to solve, with fixed x,λ, and μ,

minzLρxzλμdh,lzh,lr.E16

If only the terms in Eq. (3) relevant to the optimization variables, zh,l, are concerned, Eq. (16) becomes

minh,lμh,lTzh,l+ρ2xhxlzh,l2+λh,ldh,lzh,lzh,lrdh,l,E17

because dh,l is non-zero as defined in Eq. (6). Equation (17) indicates that each edge variable can be updated independently of the others.

We first solve Eq. (17) without the constraint and obtain the solution uh,l of

argminzh,lμh,lTzh,l+ρ2xhxlzh,l2+λh,ldh,lzh,l,E18

by soft-thresholding with

uh,l=xhxl+μh,lρλh,lρdh,l,ifxhxl+μh,lρλh,lρdh,l;xhxl+μh,lρ+λh,lρdh,l,ifxhxl+μh,lρλh,lρdh,l;0,otherwise.E19

It is then followed by orthogonally projecting uh,l to satisfy the constraint by solving

minzh,l12zh,luh,l2zh,lrdh,l.E20

Eq. (20) can be solved by a sequence of soft-thresholding operations. The algorithm is sketched as follows. First, we check whether uh,lrdh,l. If it is, uh,l is the solution. Otherwise, we begin with a small γ and solve

zh,l+=argminzh,l12zh,luh,l2+γzh,l.E21

The solution is the soft-thresholding as

zh,l+=0,ifuh,lγ;1γuh,liuh,l,otherwise.E22

If zh,l+rdh,l, zh,l is updated to zh,l+, the algorithm stops. Otherwise, γ is increased and Eq. (22) is solved again. Since increasing γ decreases zh,l+, this algorithm always stops and updates the edge variable zh,l to meet the constraint.

The complexity to update edge variables is analyzed. If the number of pixels of a block is n and if the dyadic wavelet transform takes J scales, then the number of edge constraints on a singular block is OJn2 and the number of edge constraints on a smooth block is On2. Let K1 be the maximum number of iterations to derive the solution for Eq. (20) for all graphs. The complexity of one edge update is OK1n2JF+3JWJ, where 3 is the number of orientations, JF and JW are numbers of singular blocks and smooth blocks, respectively.

3.4 Update nodes

The node update for a singular block is more complicated than that for a smooth block because a graph for a singular block has a multi-scale structure, where adjacent scales are linked by inter-scale edges.

3.5 Singular blocks

To update the nodes x in a singular block is to solve the augmented Lagrangian function (3) via

argminxLρxzλμ,E23

for given z,λ, and μ. This is not a convex problem because the second term in Eq. (3) is non-convex.

Our approach is to decompose the problem based on the scale parameter into a sequence of sub-problems. Each scale is associated with two convex sub-problems: one is a coarse-to-fine sub-problem and the other is a fine-to-coarse sub-problem. The coarse-to-fine sub-problem assumes the parent nodes at scale 2s+1 were updated earlier, while the fine-to-coarse sub-problem assumes the child nodes at scale 2s+1 were updated earlier. Let k be the current iteration number. The course-to-fine sub-problem updates the nodes at scale 2s by minimizing3

iyixi22σ2+k1xi2xpik2+h,lμh,lk1zh,lk1xhxl+ρ2zh,lk1xhxl2.E24

On the other hand, the fine-to-coarse sub-problem updates the nodes at scale by minimizing4

iyixi22σ2+k2xi2xcik2+h,lμh,lk1zh,lk1xhxl+ρ2zh,lk1xhxl2.E25

The node update problem (23) can then be approximated by repeatedly solving the coarse-to-fine iteration followed by the fine-to-coarse iteration. The coarse-to-fine iteration solves a sequence of the coarse-to-fine sub-problems beginning at the coarsest scale. In contrast, the fine-to-coarse iteration solves a sequence of the fine-to-coarse sub-problems beginning at the finest scale.

Problems (24) and (25) can be efficiently solved. The objectives in the sub-problems are strictly convex functions because their Hessian matrices are positive definite (as can be observed from the inverse matrix of Eqs. (26) and (27)) and, thus, the optimal solution of each is unique. The closed-form solutions of the sub-problems can be derived as follows.

For convenience, we omit all the superscript index in Eqs. (24) and (25) and let Axs and Azs denote the sub-matrices of Ax and Ax, respectively. Axs and Azs retain only the rows and columns in Ax and Ax corresponding to the nodes and edges at scale 2s, respectively. We also let xs, zs, and μs denote the vectors of nodes, edges, and Lagrangian variables at scale 2s. The closed-form solution of xs of the coarse-to-fine sub-problem is

1σ2I+Cs+1TCs+1+ρAxsTAxs11σ2ysAxsTμsρAzszsforsJ;1σ2I+ρAxsTAxs11σ2ysAxsTμsρAzszsfors=J;E26

where Cs+1 is a diagonal matrix which diagonal element at ii is k12xpi and 2J is the coarsest scale. On the other hand, the closed-form solution of xs for the fine-to-coarse sub-problem is

1σ2+2k2I+ρAxsTAxs11σ2ys+22k2xs1AxsTμsρAzszsfors2;1σ2I+ρAxsTAxs11σ2ysAxsTμsρAzszsfors=1;E27

The complexity of the matrix inversion in Eqs. (26) and (27) is low since Axs is a sparse matrix and each row of Axs has at most one 1 and one 1 and the rest are zero. The complexity of the sparse matrix inversion in Matlab is proportional to the number of non-zero elements in the matrix. Thus, one iteration of either coarse-to-fine or fine-to-coarse of a graph takes the complexity OJn, where J is the number of decomposition and n is the number of pixels at a scale.

3.6 Smooth blocks

The node update for a smooth block can be analytically derived from the problem (15) at the condition that zh,l is given. If the superscript index is omitted, the closed-form solution is

1σ2I+ρAxTAx11σ2yAxsTμρAzz,E28

where Ax and Az are defined in Eq. (10). The complexity of the inversion of the sparse matrix 1σ2I+ρAxTAx is propositional to the non-zero elements in the matrix, which is On, where n is number of pixels in a block.

If an image has JW singular blocks and JF smooth blocks, and if J is the number of wavelet decompositions, the total complexity of node updates of the image is On3K2JJW+JF, where K2 is the maximum number of iterations of coarse-to-fine and fine-to-coarse node update for a singular block.

Graphical ADMM consists of a sequence of updating the primal and dual variables and the complexity of the algorithm is dominated by the primal variable updates. Our analysis of one iteration of Eq. (13) for node updates and edge updates indicates that the costs are On3K2JJW+JF and OK1n23JWJ+JF, respectively, where n is the number of pixels in a block.

Advertisement

4. Optimal graphs for image restoration

There are various image restoration methods [23]. Here, we use the proximal approach proposed in [24] and [12]. The method smartly reduces the image restoration problem into a sequence of noise reduction problems. Since graphical ADMM for noise reduction is efficient, it can be adopted to derive the optimal graphs for image restoration. Like for noise reduction, a graph is handled independently of the other graphs. The following discussion is focused on deriving the optimal graph for a block.

Let hx be the objective function

12yHx2,E29

with a known blur kernel H; x0 is the vector of the current restored image. The proximity function is defined as

dHxx0=β2xx0212HxHx02.E30

The parameter β is chosen so that dHxx0 is strictly convex with respect to x. This implies that its Hessian βIHTH is a positive definite matrix, which can be achieved by choosing β>λmaxHTH (the maximal eigenvalue of the matrix HTH). The proximal objective is defined as

h˜xx0=hx+dHxx0.E31

Simplifying the above objective, we have the following simpler form by removing Hx from the proximal objective as

h˜xx0=β2xx0+1qHTyHx02+KE32

where K contains terms unrelated to x. Since x0,q,H,y are given, the proximal objective can be regarded as a noise reduction problem with the observation vector, x0+1qHTyHx0. Thus, h˜xx0 can be the first term in noise reduction problem (8) and the algorithm for noise reduction can be used to derive the optimal graph via separating the node and edge updates following Eq. (13) and procedures in Section 3.

Advertisement

5. Experiments and comparisons

We consider several image denoising and deblurring scenarios used as the benchmarks in state-of-the-art algorithms for performance evaluations and comparisons. The setting of experiments is given as follows. The experiments were conducted on images in Sets I and II in Figure 3. Set I contains six gray-scaled natural images, Einstein, Boat, Barbara, Lena, Cameraman, and House. The size of each image is 512×512 or 256×256, downloaded from the USC-SIPI image database [25]; and Set II contains six gray-scaled textures. Some of them were taken from the Brodatz texture set. Through all experiments, each image is divided into 16 equal-sized blocks. A singular block is decomposed into four scales dyadic wavelet transform with the CDF 9/7 wavelet filters. Since the CDF 9/7 filters are close to orthogonal wavelet filters, the noise variance at any sub-band can be set to σ2, the variance of noise in the image domain [7].

Figure 3.

The images in set I (first row) and set II (second row). Set I contains six gray-scaled natural images and set II contains six gray-scaled textures.

5.1 Noise reduction performance

Our noise reduction performance was compared against that of BM3D, WBN, and DRUNet. The perceptual quality of the methods is shown in Figure 4. The Lena image of BM3D over-smooths the highlighted area of hat, which is rich in edges and textures. Similarly, textures in the highlighted area of hat in DRUNet are smooth. The image of WBN, on the contrary, under-smooths the highlighted smooth area around the chin and shoulder of Lena. These artifacts have been amended by graphical ADMM, as shown in Figure 4f.

Figure 4.

Comparisons of the denoised Lena images derived by BM3D, WBN, DRUNet, and graphical ADMM. The noise standard deviation is set at σ=25: (a) the original 512×512 Lena image; (b) the noised image; (c) the result of BM3D; (d) the result of WBN; (e) the result of DRUNet; and (f) the result of graphical ADMM. Graphical ADMM preserves both the smooth and edged areas in the original image, as shown in the highlighted areas.

The quantity comparisons, measured by the peak-signal-to-noise ratio (PSNR), of Set I and Set II are shown in Tables 1 and 2, respectively. The testing environments were images contaminated with the noise of variances, σ2. As shown, the deep learning-based method (DRUNet) achieves the highest score in almost all environments. However, in the optimization-based methods (BM3D, WBN, proposed), graphical ADMM achieves unanimously the highest score in all environments.

ImageMethodPSNR
σ=10σ=15σ=20σ=25σ=30σ=35
EinsteinBM3D34.439233.033132.169431.418630.870930.3777
WBN34.484833.082132.342931.472830.917830.4109
512×512Proposed34.501333.100532.354431.486230.930630.4255
DRUNet34.994833.601932.741132.109231.595231.1622
BM3D33.888332.106730.855429.835629.095428.2992
BoatWBN33.909532.136930.887429.856129.127328.3285
512×512Proposed33.924132.150430.906329.877229.141628.3410
DRUNet34.426432.712331.519430.576829.839129.1826
BM3D34.956733.066631.737630.717629.704928.8879
BarbaraWBN34.964333.083131.751530.733229.723328.4571
512×512Proposed34.970433.101231.773530.746829.745828.4675
DRUNet35.211533.438932.195131.234130.427529.7520
BM3D36.636734.878233.056732.550131.653131.0301
LenaWBN36.635434.888633.304832.448831.561730.9148
512×512Proposed36.644734.896033.306532.563231.674431.1022
DRUNet36.443134.926933.836332.966932.228531.6072
BM3D34.135531.844930.379729.411828.551627.8758
CameramanWBN34.163731.867530.562929.572328.727428.0487
256×256Proposed34.166831.877630.573229.580428.733528.0614
DRUNet34.992732.913331.578830.607929.846229.2131
BM3D36.663834.902833.734932.908432.124031.5103
HouseWBN36.853834.930233.762032.936332.157131.5390
256×256Proposed36.866534.936733.774232.942032.164331.5404
DRUNet37.442035.826734.708433.925133.251732.6733

Table 1.

Comparisons of the PSNRs of the noise reduction methods on noisy images with the noise of standard deviation σ in set I.

ImageMethodPSNR
σ=10σ=15σ=20σ=25σ=30σ=35
F0BM3D32.520030.321328.859527.727926.775125.9243
WBN32.621230.414328.931427.808726.864426.0136
512×512Proposed32.633830.436528.957227.827626.886726.0345
DRUNet33.718631.716430.277029.183828.342027.6675
BM3D29.357526.383224.354022.815421.619520.6102
F3WBN29.536126.573824.536323.023621.851420.9144
512×512Proposed29.546426.589624.555423.046721.872220.9364
DRUNet30.083227.237125.314123.881222.751621.8235
BM3D29.898727.304325.621624.352223.401022.5878
F7WBN30.125327.376525.678224.418623.486222.6574
512×512Proposed30.147027.393225.691424.437523.500222.6786
DRUNet30.846728.270426.557825.320124.375123.6274
BM3D34.007532.402431.244230.31129.521328.7919
g3WBN34.038432.456631.251730.347229.550628.8064
512×512Proposed34.041532.473431.257030.361329.571228.8115
DRUNet34.263232.701631.636730.766530.014629.3556
BM3D31.315529.008727.467826.327825.436224.7034
p3WBN31.338529.027427.493626.341425.453524.7153
512×512Proposed31.352029.047527.510226.358825.476324.7274
DRUNet31.999229.715928.184727.046426.141225.3947
BM3D31.565229.578928.348127.469626.782226.2039
r3WBN31.636229.667528.427727.548626.877426.2745
512×512Proposed31.652129.683628.467227.566526.901326.3020
DRUNet31.865129.937128.708227.816227.113126.5323

Table 2.

Comparisons of the PSNRs of the noise reduction methods on set II texture images.

5.2 Image restoration performance

Table 3 presents five-point spread functions (PSFs) used for image restoration in literature [11]. Each PSF was normalized to have unit 1-norm before it was used to blur an image. The performance was compared with the state-of-the-art methods, IDD-BM3D and DPIR. To have fair comparisons, both methods used the same initial images in each experiment. The visual quality of the restored images is shown in Figure 5 and the blue and red boxes are magnifications of the highlighted areas in the image. Compared with the original images, the overall perceptual quality of the images of IDD-BM3D and DPIR appear over-smoothed, whereas graphical ADMM preserves more image details, leading to better perceptual quality. Graphical ADMM can preserve more details because it uses the multi-scale approach in treating the texture and edge regions. The wavelet persistence property allows information at coarse scales to pass to fine scales and vice versa. As a result, graphical ADMM yields shaper results in recovering singular points in images.

Blur KernelFormulationsize
h111+dx2+dy215×15
h2Uniform9×9
h314641×1,4,6,4,15×5
h4Gaussian (σ=1.6)25×25
h5Gaussian (σ=0.4)25×25

Table 3.

Blur kernels for experiments. The dx and dy in h1 are, respectively, the horizontal and vertical distances of a pixel to the center of the blur kernel.

Figure 5.

Comparisons of the deblurred images. The blue and red boxes are the magnified areas in the image. (a) the original 512×512 boat image; (b) the blurred image with blur kernel h4; (c) the image of IDD-BM3D; (d) the image of DPIR; and (e) the image of graphical ADMM. The overall perceptual quality of our image is better since that of IDD-BM3D and DPIR are over-smoothed.

The quantity comparison is shown in Figure 6, where the performance improvement of graphical ADMM over IDD-BM3D was measured by the ISNR (increased signal-to-noise ratio) [26]. The ISNR quantitatively assesses the restored images with known ground truths. Let y, x, and x0 be the vector representations of the observation, the restored image, and the ground truth, respectively; the ISNR is defined as

Figure 6.

Average and standard deviation of the ISNR gain of DPIR over IDD-BM3D and graphical ADMM over IDD-BM3D. Each image is blurred and then added to white noise of standard deviation indicated by the noise level. The image was then deblurred. An ISNR gain was calculated from the de-blurred images. The circled point and bar of a measurement at a noise level are the average and standard deviation, respectively, of thirty ISNR gains of natural images from set I ((a) DPIR over IDD-BM3D and (b) graphical ADMM over IDD-BM3D) and texture images from set II ((c) DPIR over IDD-BM3D and (d) graphical ADMM over IDD-BM3D). As shown, the curves of ISNR gain increase steadily and progressively when the noise level increases.

10log10yx02xx02.E33

The higher the ISNR value of a restored image, the better the restoration quality of the image. The ISNR gain of graphical ADMM over that of IDD-BM3D is defined as

ISNRgraphical ADMMISNRIDDBM3D,E34

and the ISNR gain of DPIR over that of IDD-BM3D is defined as

ISNRDPIRISNRIDDBM3D.E35

Figure 6a and c show the ISNR gain of DPIR over IDD-BM3D in Set I and Set II, respectively. Figure 6b and d show the ISNR gain of graphical ADMM over IDD-BM3D in Set I and Set II, respectively. Let us take Figure 6b as an example. At a noise level, each image in Set I was first blurred by a kernel in Table 3. The result was added to white noise to obtain a noisy blurred image. This procedure generated thirty noisy blurred images since Set I contains six images and Table 3 has five blur kernels. Each noisy blurred image was deblurred. The ISNR gain of the image obtained by graphical ADMM and that by IDD-BM3D was calculated. The thirty ISNR gains were then used to calculate the mean and standard derivation, as shown in Figure 6. The mean ISNR gain of graphical ADMM increases steadily and progressively over IDD-BM3D, as the noise level increases.

5.3 Discussions

DRUNet and DPIR are deep learning methods and the training data with a noise level of σ ranges from 0 to 50. In the experiments, they have the best performance in quantity comparisons but graphical ADMM is the best in visual quality. For learning-based methods, the training data is important and related to the performance. The inferred results are data-driven and not interpretable. If the training data is less or the distribution of the testing data is not similar to training, the performance will be worse. If the artifacts occurred in the results, we do not know how it happened because the network is just composed of many coefficients trained from the training data. At the same time, the time cost for training is very high. These are all the drawbacks of learning-based methods. However, the optimization-based methods are not limited to the training data. The results are derived from the objective function and interpretable. In addition, they are more stable in practical applications. So, there is a trade-off between learning and optimization-based methods.

For the optimization-based methods, the experiments have demonstrated the advantages of graphical ADMM in both the noise reduction and image restoration tasks over the compared methods. Recall that BM3D and IDD-BM3D adopt the image-dependent tight frame representations. IDD-BM3D also combines the analytic and synthetic optimization methods by de-coupling the noise reduction problem and the image restoration problem. This yields a game-theoretical approach that two formulations are used to minimize a single objective function. The solution adopted by IDD-BM3D is a Nash equilibrium point. The WBN represents an image as a multi-scale probabilistic DAG and adopts the belief propagation to derive the MAP solution.

The advantages of graphical ADMM lie in the context-dependent decompositions of an image horizontally in space and vertically along the scales in handling the image details. The spatial decomposition allows our method to overcome the cons of under-smoothing the smooth areas in WBN and keeps the pros of WBN that preserves sharp edges. Meanwhile, graphical ADMM is much more efficient than the time-consuming belief propagation adopted in WBN.

The mixture of data-dependent and data-independent edges in wavelet graph construction is a significant feature of our method. The intra-scale edges are determined by a data-dependent adaptive process, which imposes sparseness by keeping the edges which end nodes have similar coefficients in the optimal graph. The inter-scale edges are data-independent, built-in to leverage the wavelet persistence property. The inter-scale edges, passing information of singularities from finer scales to coarser scales and vice versa, can preserve more texture and edges in original images. This distinguishes our algorithm from BM3D and IDD-BM3D, which encode structure in atoms of a dictionary and select a few atoms for image representation.

Advertisement

6. Conclusions

We present a novel approach by combining spatial decomposition, vertical (multi-scale) decomposition, and ADMM optimization in a graphical framework for image noise reduction and restoration tasks. The graphical ADMM method has demonstrated that its results are superior to those of state-of-the-art algorithms. We also demonstrated that mixing data-dependent and data-independent structures in a graph representation can leverage the sparseness and persistence of a wavelet representation. Rather than adopting a combinatorial approach to derive an optimal graph, we showed that the graph can be derived by a numerically tractable optimization approach. In addition, we showed that the optimization problem is well coupled with our graph representation, and can be decomposed into a sequence of convex sub-problems, with each having an efficient closed-form solution. This opens a new perspective of combining a mixture of data-adaptive and data-independent structures, hierarchical decomposition, and optimization algorithms in modeling, representing, and solving more image processing tasks.

References

  1. 1. Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 1992;60(1–4):259-268. DOI: 10.1016/0167-2789(92)90242-F
  2. 2. Buades A, Coll B, Morel J. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation. 2005;4(2):490-530. DOI: 10.1137/040616024
  3. 3. Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing. 2006;15(12):3736-3745. DOI: 10.1109/TIP.2006.881969
  4. 4. Chatterjee P, Milanfar P. Is denoising dead? IEEE Transactions on Image Processing. 2010;19(4):895-911. DOI: 10.1109/TIP.2009.2037087
  5. 5. Li J, Shen Z, Yin R, Zhang X. A reweighted l2 method for image restoration with poisson and mixed poisson-gaussian noise. Inverse Problem and Imaging. 2015;9(3):875-894. DOI: 10.3934/ipi.2015.9.875
  6. 6. Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing. 2007;16(8):2080-2095. DOI: 10.1109/TIP.2007.901238
  7. 7. Ho J, Hwang W. Wavelet Bayesian network image denoising. IEEE Transactions on Image Processing. 2013;22(4):1277-1290. DOI: 10.1109/TIP.2012.2220150
  8. 8. Zhang K, Li Y, Zuo W, Zhang L, Gool L, Timofte R. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. DOI: 10.1109/TPAMI.2021.3088914
  9. 9. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. DOI: 10.1007/978-3-319-24574-4_28
  10. 10. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. DOI: 10.1109/CVPR.2016.90
  11. 11. Danielyan A, Katkovnik V, Egiazarian K. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing. 2012;21(4):1715-1728. DOI: 10.1109/TIP.2011.2176954
  12. 12. Zibulevsky M, Elad M. L1-L2 optimization in signal and image processing. IEEE Signal Processing Magazine. 2010;27(3):76-88. DOI: 10.1109/MSP.2010.936023
  13. 13. Heckerman D. A Tutorial on learning bayesian networks. In: Innovations in Bayesian Networks. Berlin/Heidelberg: Springer; 2008. pp. 33-82. DOI: 10.1007/978-3-540-85066-3_3
  14. 14. Chickering D, Heckerman D, Meek C. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence. 1997:80-89. Available from: https://dl.acm.org/doi/pdf/10.5555/2074226.2074236
  15. 15. Cai J, Osher S, Shen Z. Split Bregman methods and frame based image restoration. Multiscale Modeling & Simulation. 2009;8(2):337-369. DOI: 10.1137/090753504
  16. 16. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning. 2011;3(1):1-122. DOI: 10.1561/2200000016
  17. 17. Hoiem D, Efros A, Hebert M. Geometric context from a single image. IEEE International Conference on Computer Vision. 2005;1:654-661. DOI: 10.1109/ICCV.2005.107
  18. 18. Ji H, Luo Y, Shen Z. Image recovery via geometrically structured approximation. Applied and Computational Harmonic Analysis. 2016;41(1):75-93. DOI: 10.1016/j.acha.2015.08.012
  19. 19. Simoncelli E. Bayesian denoising of visual images in the wavelet domain. Bayesian Inference in Wavelet Based Models. 1999;141:291-308. DOI: 10.1007/978-1-4612-0567-8_18
  20. 20. Mallat S, Hwang W. Singularity detection and processing with wavelets. IEEE Transactions on Information Processing. 1992;38(2):617-643. DOI: 10.1109/18.119727
  21. 21. Milanfar P. A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine. 2013;30(1):106-128. DOI: 10.1109/MSP.2011.2179329
  22. 22. Sreedevi P, Hwang W, Lei S. An examplar-based approach for texture compaction synthesis and retrieval. IEEE Transactions on Image Processing. 2010;19(5):1307-1318. DOI: 10.1109/TIP.2009.2039665
  23. 23. Mairal J, Elad M, Sapiro G. Sparse representation for color image restoration. IEEE Transactions on Image Processing. 2008;17(1):53-69. DOI: 10.1109/TIP.2007.911828
  24. 24. Daubechies I, Defrise M, De-Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics. 2004;57(11):1413-1457. DOI: 10.1002/cpa.20042
  25. 25. The USC-SIPI Image Database, Signal and Image Processing Institute, University of Southern California. Available from: https://sipi.usc.edu/database/
  26. 26. Hanif M, Seghouane A. Blind image deblurring using non-negative sparse approximation. IEEE International Conference on Image Processing. 2014. DOI: 10.1109/ICIP.2014.7025821

Notes

  • If the wavelet is normalized to have unit 1-norm, then the exponent of Eq. (4) should be α.
  • Let λ=λh,l. Then, λ≥0 if and only if λh,l≥0 for all h and l.
  • The second term at below is zero, when 2s is the coarsest scale.
  • The second term below is zero when nodes are at the finest scale.

Written By

Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang

Reviewed: 14 January 2022 Published: 27 February 2022