Engineering » Biomedical Engineering » "Applied Biomedical Engineering", book edited by Gaetano D. Gargiulo, Co-editor: Alistair McEwan, ISBN 978-953-307-256-2, Published: August 23, 2011 under CC BY-NC-SA 3.0 license

Chapter 19

Biomedical Image Volumes Denoising via the Wavelet Transform

By Eva Jerhotová, Jan Švihlík and Aleš Procházka
DOI: 10.5772/20256

Article top

Overview

1. A CTimage of the brain (a), a subset of CT images (b) from the same examination, and the Shepp-Logan phantom image (c)
Figure 1. 1. A CTimage of the brain (a), a subset of CT images (b) from the same examination, and the Shepp-Logan phantom image (c)
2. The subband coding algorithm for computing the 1-dimensional DWT
Figure 2. 2. The subband coding algorithm for computing the 1-dimensional DWT
 The 1-dimensional DTCWT decomposition scheme
Figure 3. The 1-dimensional DTCWT decomposition scheme
The 3-dimensional DWT computation steps for a single decomposition level
Figure 4. The 3-dimensional DWT computation steps for a single decomposition level
The 2-diomensional DWT (left) and DTCWT (right) decomposition up to level 2 for a cropped biomedical image
Figure 5. The 2-diomensional DWT (left) and DTCWT (right) decomposition up to level 2 for a cropped biomedical image
The wavelet shrinkage procedure demonstrated on a cropped biomedical image decomposed by the DWT to the second level
Figure 6. The wavelet shrinkage procedure demonstrated on a cropped biomedical image decomposed by the DWT to the second level
The intensity profile of a typical CT slice (a)andexamples of selected background patches (marked by the three rectangles at the top of the image) (b)
Figure 7. The intensity profile of a typical CT slice (a)andexamples of selected background patches (marked by the three rectangles at the top of the image) (b)
The normalized histogram of the analyzed noise fitted with the GLM (υ= 1.89, s = 3.02, μ = -0.34) (a) and the GMM (σ1n=2.10, σ2n=2.77, α =0.89) (b)
Figure 8. The normalized histogram of the analyzed noise fitted with the GLM (υ= 1.89, s = 3.02, μ = -0.34) (a) and the GMM (σ1n=2.10, σ2n=2.77, α =0.89) (b)
9.A selection of the intensity profile of a selected CT slice presenting the noisy image (D(n) = 14.42) (a), the result of denoising using the UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters at level 1 and qshift filters beyond level 1) (d)
Figure 9. 9.A selection of the intensity profile of a selected CT slice presenting the noisy image (D(n) = 14.42) (a), the result of denoising using the UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters at level 1 and qshift filters beyond level 1) (d)
The PDFs of the complex wavelet coefficients magnitudes for the noisy observation (a) and the noise (b) including the noise histogram
Figure 10. The PDFs of the complex wavelet coefficients magnitudes for the noisy observation (a) and the noise (b) including the noise histogram
Denoising using soft thresholding of a selected CT slice presenting the noisy image slice (D(n) = 14.42) (a), the result of denoising usingthe UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters and qshift filters)(d)
Figure 11. Denoising using soft thresholding of a selected CT slice presenting the noisy image slice (D(n) = 14.42) (a), the result of denoising usingthe UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters and qshift filters)(d)
 The results of the BayesShrink method using the non-Gaussian noise models and soft thresholding presenting the noisy image (a), the result of using the DTCWT (b) and the DWT (c) and the corresponding difference images of D(n) = 2.87 (d) and D(n) = 0.6 (e) of the normalized intensities between [-10; 10]
Figure 12. The results of the BayesShrink method using the non-Gaussian noise models and soft thresholding presenting the noisy image (a), the result of using the DTCWT (b) and the DWT (c) and the corresponding difference images of D(n) = 2.87 (d) and D(n) = 0.6 (e) of the normalized intensities between [-10; 10]
The Shepp-Logan phantom with added noise (RMSE = 22.7) (a) and the denoised image using the UDWT with the Daubechies6 wavelet(RMSE = 5.5) (b)and the respective intensity profiles (c) and (d)
Figure 13. The Shepp-Logan phantom with added noise (RMSE = 22.7) (a) and the denoised image using the UDWT with the Daubechies6 wavelet(RMSE = 5.5) (b)and the respective intensity profiles (c) and (d)
The wavelet-based HMT hierarchy for 2-dimensional DWT, where each parent node p(i) has four children i
Figure 14. The wavelet-based HMT hierarchy for 2-dimensional DWT, where each parent node p(i) has four children i
 The results of the HMT training for the CT image in Fig. 16 presenting histograms of the detail coefficients subbands from level 1 LH1 (a), HL1 (b), and HH1 (c), respectively and the GMM of conditional probabilities
Figure 15. The results of the HMT training for the CT image in Fig. 16 presenting histograms of the detail coefficients subbands from level 1 LH1 (a), HL1 (b), and HH1 (c), respectively and the GMM of conditional probabilities
2 Denoising a selected CT image using the 2-level DWT with the LeGall biorthogonal filters depicting the observed image (a), its denoised equivalent, and the difference image (c) of the intensity range [-10,10]
Figure 16. 2 Denoising a selected CT image using the 2-level DWT with the LeGall biorthogonal filters depicting the observed image (a), its denoised equivalent, and the difference image (c) of the intensity range [-10,10]

Biomedical Image Volumes Denoising via the Wavelet Transform

Eva Jerhotová1, Jan Švihlík 1 and Aleš Procházka1

1. Introduction

Image denoising represents a crucial initial step in biomedical image processing and analysis. Denoising belongs to the family of image enhancement methods (Bovik, 2009) which comprise also blur reduction, resolution enhancement, artefacts suppression, and edge enhancement. The motivation for enhancing the biomedical image quality is twofold. First, improving the visual quality may yield more accurate medical diagnostics, and second,analytical methods, such as segmentation and content recognition, require image preprocessing on the input.

Gradually, noise reduction methods developed in other research fields find their usage in biomedical applications. However, biomedical images, such as images obtained from computed tomography (CT) scanners, are quite specific. Modelling noise based on the first principles of image acquisition and transmissionis a too complex task (Borsdorf et al., 2009), and moreover, the noise component characteristics depend on the measurement conditions (Bovik, 2009).

Additionally, noise reduction must be carried out with extreme care to avoidsuppression of the important image content. For this reason, the results of biomedical image denoising should be consulted with medical experts.

media/image1.jpeg

Figure 1.

1. A CTimage of the brain (a), a subset of CT images (b) from the same examination, and the Shepp-Logan phantom image (c)

There is a range of noise reduction techniques applied to images either in the spatial domain or in a selected transform domain (Motwani et al., 2004). The former include linear or nonlinear low-pass filters, such as moving average filters, the Wiener filter or a variety of median filters. Left alone some recent developments of weighted median, the space domain techniques generally remove the majority of noise, but on the other hand, also blur sharp image edges. To overcome these problems, it is advantageous to use other domains for image representation and processing.

An optimal representation should capture key features of the signal in a relatively small number of large transform coefficients while the majority of the coefficients should be very small or zero. In other words, the representation should be sparse( Mallat, 2009) . In this respect, the wavelet domain is a good choice. Both for signals with spatial transients and narrow-band frequency content, the wavelet transform represents a good compromise between the frequency and the spatial domain representations. Furthermore, for real-world images comprising both spatial transients such as edges and narrow-band components such as regular texture regions, the wavelet transform with the space-scale (or space-frequency) representation outperforms the other two .

Aiming at reducing noise in CT images, researchers focus on optimizing the filtered back-projection (FBP) algorithm used in CT-scanners for image reconstruction. For instance, You & Zeng (2007) propose an improved FBP algorithm, which incorporates the Hilbert transform, and Zhu & StarLack (2007) propose a method for predicting the noise variance and subsequently adapting weights and kernels of the FBP algorithm. Similar to other researches, they exploit a simple Shepp-Logan phantom image which is depicted in Fig. 1c. In their experiments, noise with various statistical properties is introduced into the projection data, and then the result reconstructed by the reconstruction algorithm is evaluated.

Some researchers study the possibilities of the wavelet transform in biomedical image denoising and compression. They most commonly use the critically sampled discrete wavelet transform (DWT). Khare & Shanker Tiwary(2005) utilise the dual-tree complex wavelet transform (DTCWT) for adaptive shrinkage. Bosdorf et al. (2008) propose a wavelet-based correlation analysis method applied to pairs of disjoint projections in dual-source CT-scanners in oder to extract and eliminate uncorrelated noise. The authors use both the DWT and undecimated DWT (UDWT) and in conclusion, indicate their preference for the computation efficiency of the DWT over the slightly better results quality produced by the UDWT. Wang & Huang (1996) and Wu & Qiu (2005) study the use of volumetric processing of biomedical image slices and its advantages in comparison with slice-by-slice processing.

In this chapter, we shall focus onwavelet-based techniques for noise reduction in image volumes produced by a standard CT-scanner (see Fig. 1). First, we shalldescribe the wavelet transform focusing on the DWT, the UDWT, the DTCWT, and their one-dimensional and multi-dimensional implementations. Second, we shall carry out the noise analysis on selected CT data sets. Final, we shalldeal withwavelet-based denoising methods comprising wavelet coefficient thresholding methods (VisuShrink, SureShrink, and BayesShrink) and statistical modelling methods (the hidden Markov trees, the Gaussian mixture model, and the generalized Laplacian model) utilizing the Bayesian estimator.

2. Wavelet transform

The wavelet transform analyzes signals at multiple scales by changing the width of the analysis window, and produces their scale-space representation (Mallat, 2009). In contrast to the short-time Fourier transform, the wavelet transform deals with the limitations ofuncertainty principle in a way that is more convenient for most real-world signals. This principle definesthe trade-off between the length of the slidingwindow (i.e. the spatial resolution) and the distance between adjacent spectral lines (i.e. the frequency resolution). The wavelet transform exploits longer windows (i.e. better frequency resolution) for lowfrequency componentsof the signal, which are, nonetheless, usually of long duration, and shorter windows (i.e. better spatial resolution) for high frequency components of short duration. An important aspect of wavelet-based denoising is transform selection. In the following subsections, we shall discus the critically sampled DWT and two examples of a redundant wavelet transform with improved properties.

2.1. Discrete wavelet transform

The DWT is probably the most popular type of the wavelet transform in the signal and image processing field. This transform has become successfulprimarily in compression,as it became part of the JPEG2000 standard. This transform is computed via thesubband coding algorithm designed by. As displayed in Fig. 2, at each decomposition stage, the transform produces detail coefficients and approximation coefficients corresponding respectively to the upper half and the lower half of the input signal spectrum. The approximations then become an input of the next level.

media/image2.jpg

Figure 2.

2. The subband coding algorithm for computing the 1-dimensional DWT

The detail coefficients cH(j)at stage j are produced throughconvolution of the approximations cL(j-1) from the previous level (or the original signal when j=1) with the band-pass filterh1 (derived from the wavelet function) and subsequent down-sampling by a factor of 2. The convolution is given as

cHj(k)=k=-h1n-2kcLj-1n.

Similarly, the low frequency approximations cL(j) at level j are produced by the convolution with the low-pass filter h0 (derived from the scaling function).

cLj(k)=k=-h0(n-2k)cLj-1n.

Due to down-sampling, the approximation vector is rescaled prior to entering the next decomposition level, while the filter taps are preserved unchanged.

We may also synthesize the original input signal from the decomposition coefficients. The inverse DWT is given as

cLj-1k=n=-g0k-2ncLjn+n=-g1(k-2n)cHjn,

where g0 and g1 are respectively the low-pass and the band-pass reconstruction filters. Before computing the convolution, the subband coefficients are up-sampled by 2 by inserting zero-valued samples.

The decomposition and reconstruction filter banks are designed as orthogonal or bi-orthogonal (or dual) bases . The limitation of the orthogonal solution is that the associated real-valued wavelet function cannot have compact support and be symmetrical at the same time, except the Haar wavelet. However, the Haar wavelet is not smooth. Bi-orthogonal solutions provide more construction freedom and allow for smoothness and symmetry. Symmetrical filters have a linear phase response, and hence do not cause phase distortion to which the human eye is particularly sensitive. Consequently, bi-orthogonal wavelets are nowadays probably the most widely used.

The DWT is the most computationally efficient of all wavelet transforms. However, critical sampling causes significant drawbacks. This transform lacks shift-invariance, zero-crossings often appear at the locations of signal singularities, and altering wavelet coefficients (for instance, during denoising) causes artefacts in reconstructed images. To overcome these problems, we may use a redundant wavelet representation.

2.2. Redundant wavelet transforms

In this section, we describe two redundant wavelet transforms: the undecimated discrete wavelet transform (UDWT) and the dual-tree complex wavelet transform (DTCWT) designed by Kingsbury & Selesnick (Selesnick et al., 2005).The former is calculated through the subband coding algorithm exploitingthe same filters as the DWT. The distinction lies in omitting the down-sampling step in decomposition. Instead, the decomposition filters are up-sampled at each stage (Starck et al., 2007). As a result, the UDWT is shift invariant which means that a shift in the input signal corresponds to a shift in the transform output and does not cause any other changes in the coefficient values. On the other hand, leaving out the down-sampling step yields a significant computation burden. The redundancy of this transform depends on the number of decomposition levels L and is given as [2d-1 L+1]:1, where d denotes the number of dimensions. As a result, for 2-dimensional decomposition, the redundancy of this transform with respect to the DWT is 4:1 at stage 1, increases to 7:1 at stage 2, further increases to 10:1 at stage 3, etc.

In comparison with the UDWT, the DTCWT exhibits relatively moderate redundancy, which is 2d:1 with respect to the DWT. In contrast to the UDWT, this ratio does not increase with the number of stages. To illustrate the redundancy, the DTCWT produces twice as many coefficient values than the DWT in 1-dimensionalspace (i.e. 2 subbands of complex coefficients composed of the real and the imaginary parts).

The DTCWT is realized with FIR (finite impulse response) filters forming a tight frame, and is thus easily revertible (unlike, for instance, the Gabor transform). As displayed in Fig. 3, this transform employs two DWT-like trees a and b producing respectively the real parts ca and the imaginary parts cb of the complex wavelet coefficients c=ca+jcb,where j=(-1).

media/image3.jpg

Figure 3.

The 1-dimensional DTCWT decomposition scheme

It may seem surprising that a real signal is converted into the complex wavelet representation by using real-valued filters. This is possible thanks to the Hilbert transform built into each transform stage. Ideally, the complex scaling function ϳt=ϳa(t)+jϳb(t) and the wavelet function Ϧt=Ϧa(t)+jϦb(t) should be analytic, which means that their respective real and imaginary parts constitute Hilbert transform pairs, so as ϳb(t)=H{ϳa(t)} and Ϧb(t)=H{Ϧa(t)}. In the Fourier domain, this is equivalent to

φbϧ=-jsign(ϧ)φaϧ,

where sign denotes the sign function and ω the angular frequency. (The same relation applies also to the scaling function). As a consequence, the spectrum of the analytic wavelet is single-sided with zero magnitudes for negative frequencies. This property directly implies shift invariance, no aliasing, and the ability to isolate singularities of positive and negative directions in higher dimensions.

As the scaling and the wavelet function should be analytic, the filters associated with the real and the imaginary part of these functions must be delayed from each other by half a sample period

h0b(n)=h0a(n-0.5)
,

where h0a and h0b are the low-pass filters of tree a and tree b, respectively. However, exact analycity cannot be achieved by functions with compact support. In other words, the Hilbert transformer is of infinite length and may not be exactly implemented with an FIR filter. Consequently, the wavelet and the scaling filters used in the DTCWT are only approximately analytic and shift-independent, and approximately fulfil condition (5).

In this chapter, we use the q-shift filter solution for the DTCWT by (Kingsbury, 2003). To achieve a higher degree of analycity at lower decomposition levels, this solution exploitsdifferent filters at level 1 than at higher levels. For level 1, it is possible to use the same filter set for both trees as long as it provides perfect reconstruction (for instance, one of the biorthogonal filter sets designed for the DWT). The only difference is that the filters in one tree are translated by one sample from the corresponding filters in the other tree

h0bon=h0aon-1,

and also each to the other within the same tree (as indicated by the value in brackets in Fig. 3). Beyond level 1, we employ the approximately analytic q-shift filters, for which the low-pass (and also the high-pass) filters from opposite trees are time-reversed versions of each other

h0bn=h0aN-1-n
.

A similar relation applies also to the analysis and synthesis filters. These filters are of even length and approximately symmetrical, as their point of symmetry is ¼-sample away (i.e. q-shifted) from the centre. Hence, the filters exhibit a q-sample group delay and their individual phase response is not exactly linear. On the other hand, the assymmetry makes the orthonormal perfect reconstruction feasible. For the overall complex wavelet and also the scaling function, the conjugate phase response is exactly linear and the magnitude is approximately shift-invariant. In addition to the shift-invariance property, theDTCWT provides better directional selectivity than both the DWT and UDWT in multiple dimensions.

2.3. Multidimensional wavelet transform

Computing the multidimensional wavelet transform is straightforward due to its separability. This property implies that the n-dimensional (nD) transform may be implemented as n consecutive 1D transform in different directions as illustrated in Fig. 4 for the DWT. For n=3, we may proceed for instance in the following order. First, each slice of the image volume is processed in the row direction resulting into the low frequency coefficients (i.e. the approximations cL) and the high frequency coefficients (i.e. the details cH). The resulting 1D decomposed slices are then processed in the column directionyielding the 2D transform of 4 coefficients subbands (cLL, cLH, cHL, and cHH) for each mage slice. Finally, the set of the 2D transform coefficients matrices is processed in the between-slice direction producing 8 subbands (cLLL,cLLH,cHLL, …, and cHHH). The coefficients cLLL constitute an input to the next level of the 3D transform .

media/image4.jpeg

Figure 4.

The 3-dimensional DWT computation steps for a single decomposition level

Please note that for the UDWT, the procedure is identical except that the size of the 3-dimensional cube depicted in Fig. 4 doubles its size in the respective direction at each step. For the DTCWT, the multidimensional decomposition procedure is similar except producing a different number of subbands. For instance, the 2D DTCWT produces 8 subbands of complex-valued coefficients which correspond to 4:1 redundancy w.r.t. the 2D DWT of 4 subbands of real-valued coefficients. Despite being separable, the 2D DTCWT is truly directional. Its 6 directional subbands separate positive and negative singularity orientations(-75°, -45°, -15°, +15°, +45°, +75°). In contrast, the separable DWT mixes the negative and positive orientations together in its 3 subbands (0°, ±45°,±90°). The improved directional selectivity in higher dimensions represents another advantage over both the criticallysampled DWT and UDWT.

media/image5.jpeg

Figure 5.

The 2-diomensional DWT (left) and DTCWT (right) decomposition up to level 2 for a cropped biomedical image

The volumetric wavelet transform does not necessarily need be uniform in all three directions. We may for instance use longer filters within the slices and shorter filters in the direction between slices . The rationale behind this lies in the variance of the spatial resolution in different directions. In the between-slice direction, the resolution is coarser than in the intra-slice directions (depending on the slice thickness and spacing).

3. Wavelet-based denoising methods

As proved in a range of signal processing research areas, the wavelet transform is a suitable representation for estimating noise free images from their noisy observations owing to its sparsity and multiscale nature.As outlined in Fig. 6, denoising is based on image

media/image6.jpg

Figure 6.

The wavelet shrinkage procedure demonstrated on a cropped biomedical image decomposed by the DWT to the second level

transformation into the wavelet domain and subsequent reconstruction from the altered detail coefficients and unchanged approximation coefficients from the last decomposition level. This procedure is called wavelet shrinkage and is associated with two main-stream approaches, which are discussed in this section: coefficient thresholding and probabilistic coefficient modelling.

3.1. Noise analysis

The quality of CT images depends directly on the radiation dose from an examination. The dose is influenced by the following quantities : X-ray tube current, exposure time, beam energy, slice thickness, table speed, type of the reconstruction algorithm, focal-spot-to-isocenter distance, detector efficiency, etc. Overall, CT image acquisition is a complex process affected many factors, such as the post-processing algorithm and nonlinearities of several parts of the device. By minimizing the radiation exposure of the patient, the amount of noise and artefacts increases.

Noise analysis represents a fundamental initial step of advanced noise suppression algorithms. In common devices, such as cameras with CCD (charged coupled device) or CMOS (complementary metal-oxide-semiconductor) sensors, noise analysis is based on acquisition of a testing pattern, which contains several patches with constant greyscale levels ranging from black to white . However, CT image volumes are a specific case and more realizations of the same acquisition process cannot be obtained so simply. One option would be to use phantoms (for instance water phantoms), which mimic the consistence of the human body (basically - bone, tissue, water, and air). It is then possible to analyze the noise component by using several images generated by scanning the phantom. Another option of obtaining the data for noise analysis would be to acquire the same volumetric slice twice during a regular examination. However, this would increase the radiation dose for the patient and the noise would be still impacted by motion-generated artefacts (e.g. from breathing). To prevent from any motion, the patient head would have to be tightly fixed which is not applicable.

media/image7.jpeg

Figure 7.

The intensity profile of a typical CT slice (a)andexamples of selected background patches (marked by the three rectangles at the top of the image) (b)

New interesting possibilities arise with introduction of multi-detector scanners. Bosdorf et al. (2008) analyse noise in images from the latest generation dual-source CT-scanners, which produce two images of the same slice (one from each detector). The authors propose to extract the uncorrelated noise component via wavelet-based correlation analysis of the corresponding images. According to their findings, the noise in CT images is non-white, usually of an unknown distribution, and not stationary.

In our experiments, we analyzed images from a standard CT-scanner and without possessing the corresponding phantom images for noise analysis. In order to analyze noise characteristics, we selected patches of the background, i.e. the part of the image capturing no part of the patient’s head or the bed as apparent from Fig. 7b with an altered histogram. Nevertheless, this type of analysis does not reveal whether the noise is signal-dependent or not.

Even thoughwe are aware that the noise is not strictly additive, we assume the additive noise model

y= x+n

for the sake of simplicity (y denotes the observed signal, x the noise-free signal, and n independent noise).We analyze the noise for each image of the volume individually, since the noise variance changes considerably from slice to slice. Fig. 7a shows a typical intensity profile of a CT image.

As we demonstrate below, the noise obtained from the background patches as well as the observed signal may be modelledusing the GLM (generalized Laplacian model) or the GMM (Gaussian mixture model) in the spatial domain. The parameters of these models may be estimated exploiting the method of moments. This method is based on comparing the sample moments with the theoretical moments. Let us consider samples{yi}i=1I of the observed signal and define the kth sample moment

Mk=1Ii=1Iyik,1k

and the theoretical moment

mk=-ykp(y;ϖ1,ϖ2,...,ϖm)dy,

wherep(y)denotes the probability density function of y.Theparameters ϖ1,ϖ2,...,ϖmof the probability distribution are estimated through the following system of equations

Mk=mk,1km.

3.1.1 Generalized Laplacianmodel of the noise

Every background patch is analyzed in the following way. We compute an optimized histogram to obtain the PDF (probability density function) of the noise, which is then tested for normality. The quality of the histogram shape depends greatly on the bin widthBW. There is a range of approaches for bin width optimization, such as those described in and . The bin width originally proposedby Freedman & Diaconis can be written as

BW=2n0.75-n0.25I-13,

where the term in the parentheses denotes a so-called interquartile range between the 75th percentile n_(0.75) and the 25th percentile n_(0.25). In our experiments, the Kolmogorov-Smirnov test rejected the null hypothesis that the samples come from the Gaussian distribution at the significance level Ϗ=0.05 in most of the tested images.

Hence, it is necessary to employ a more general model. We choose a model with heavytails given by

pnn;Ϛ,s,ϛ=e-n-ϚsϛZs,ϛ,n-;,

where μ denotes the mean value, the parameter ϛ presents generalization in the sense of the PDF shape, and the parameter s controls the PDF width. The function Z(s,ϛ)=2sϛα1ϛ, where α(r)=0tr-1e-tdt is the gamma function, normalizes the exponential to a unit area. The PDF parameters may be estimated by using the system of moment equations. For simplicity we consider noise n with μ = 0. The second and the fourth moment of the noise n are given as

m2n=s2α3ϛα1ϛ,m4n=s4α5ϛα1ϛ,

As proposed by Simoncelli & Adelson (1996), parameters estimation may be simplified using the kurtosis

Ϙn=m4(n)m22(n)=α5ϛα1ϛα23ϛ.

The above described model is widely known as the generalized Laplacian model (GLM). This model is commonly used to model filtered images, such as the wavelet coefficients of high frequency bands. The histogram and the modelled PDF for a selected background patchare depicted in Fig. 8a using the logarithmic scale, which clearly illustrates the quality of the fit.

media/image8.jpeg

Figure 8.

The normalized histogram of the analyzed noise fitted with the GLM (υ= 1.89, s = 3.02, μ = -0.34) (a) and the GMM (σ1n=2.10, σ2n=2.77, α =0.89) (b)

For denoising, we transform images into the wavelet domain. In case of the Gaussian-distributed noise, the noise parameters are preserved unaffected by the transformation. In contrast, forthe non-Gaussian noise, the parameters change and thusneed be re-estimated.We may either estimate the parameters directly in the wavelet domain (e.g. via the method of moments), or alternatively,transform the moments into the wavelet domain.

3.1.2. Gaussian mixturemodel of the noise

Another possibility of modelling noise in the selected background patches is the Gaussian mixture model (GMM). This model is generally given by a mixture of a certain number of Gaussians with the variancesσkn and the mean valuesμkn

p(n)=k=1KϏknN(n;Ϛkn,ϡkn2),

whereαkn are the proportions of the mixturesatisfying the constraint k=1KϏkn=1. As a compromise between solvability of the moment equations system and quality of the fit, we set K = 2. The GMM is than given by

pn=ϏnNn;Ϛ1n,ϡ1n2+1-ϏnNn;Ϛ2n,ϡ2n2,

where the mean valuesμkn are assumed to equal zero.

To estimate the model parameters in the wavelet domain, we may use the system of moment equations employing the second and the fourth central moment derived in (Švihlík, 2009). The noise parameters (the variancesϡ1N, ϡ2N and the mixture proportion ϏN)may be directly estimated from the wavelet coefficientsN = WT{n} as follows.First, the varianceϡ1N is estimated asϡ1N=N0.999/3, where N0.999denotes the 99.9th percentile. Second, the proportion ϏNis estimated from kurtosis

ϘN=m4(N)m22(N)3ϏN,for ϏN-10,

where m4(N) and m2(N) are the central moments of N. And final, the remaining model parameterϡ2N is then given as (Švihlík, 2009)

ϡ2N2=m2(N)-ϏNϡ1N21-ϏN.

Using the logarithmic scale, Fig. 8b displays the result of fitting the model to the histogram of the selected background patch.

3.2 Wavelet coefficient thresholding

The method of wavelet coefficients thresholding is based on suppressing low-energy detail coefficients which are presumed to noise-dominated. To do this, we may choose different thresholding functions. The basic two thresholding function types are the hard thresholding and the soft thresholding function. For the former, the coefficients generated by alteringthe observed real-valued coefficients{Yi}i=1I are given by

Yithr=Xi̠=Yi0 forYi>ϒ(h)otherwise

where ϒ(h)R0 stands for the hard threshold limit and Xi̠ denotes the estimated coefficients of the noise-free signal. For soft thresholding, the thresholded coefficients are given as

Yithr=Xi̠=signYi(|Yi|-ϒ(s))0 forYi>ϒ(s)otherwise

where ϒ(s)R0 stands for the soft threshold limit. In case of complex-valued coefficients, we threshold the magnitudes and keep the phase Yi unchanged. For instance, the soft thresholding formula remains almost the same except that the magnitude Yi=Yia2+Yib2replaces Yi and as a result, the sign function may be omitted. We then use the thresholded magnitudes to obtain the complex thresholded coefficient

Yithr=Xi̠=|Yi|ejYi

Hard thresholding preserves the coefficient with the values greater than the threshold. This may, however, introduce discontinuities in the coefficients values, which may result in artefacts. In contrast, the soft thresholding function does not introduce artefacts owing to being continuous. This function complies with the assumption that noise is distributed evenly in all coefficients. However, when this is not the case, this technique reduces also the values of the coefficients corresponding to the underlying noise-free signal, which results in edge blurring .

3.2.1. Thresholdestimationmethods for the Gaussian noise

Threshold estimation methods vary in the assumption of the noise variance uniformity for different scales and subbands and in the threshold value estimation methods which they employ. In general, orthonormal transforms preserve the statistics of the i.i.d. (independent identically distributed)Gaussian noise. That is the reason why the most widely used methods are derived with this assumption.

The VisuShrink methodassumes the noise variance the same for all thresholded subbands. The noise variance is estimated using the median absolute deviation (MAD) of the coefficients from the highest-frequency subband which is presumed to be noise-dominated.

ϡ̠MAD=median(|chh1|)0.6745
,

where the constant in the denominator corresponds to the Gaussian distribution. The primary advantage of this variance estimation method is its robustness to outliers. The estimated noise varianceis than exploited for computing the universal threshold with the same value for all levels and subbands

ϒ=ϡn2loglogL
,

where ϡn is the noise standard deviation, L is the number of signal samples and log is the natural logarithm. This threshold computation formula is derived by minimizing the probability that any noise sample will exceed the threshold limit. The resulting threshold is applied to the wavelet coefficients via the soft thresholding technique defined in (21). VisuShrink removes the vast majority of noise from the image, but also tends to over-smooth the image since common signals are not sparse enough to comply with the minmax theory. In response to these findings, Donoho & Johnstone proposed another method called SureShrink (Stein’s Unbiased Risk Estimate Shrinkage), which produces subband-adaptive thresholds and is optimal in the mean-squared error sense. They further proposed a hybrid scheme combining the above approaches, since SureShrink does not perform well for situations of extreme sparsity.

In literature, SureShrink is often compared with BayesShrink for different types of data and the two methods. BayesShrinkderives the threshold within the Bayesian framework assuming a generalized Gaussian distribution of the wavelet coefficients and the additive noise model of the observed coefficients

Y=X+N,

where N = WT{n} denotesthe noise coefficients andX = WT{x}the noise-freesignal coefficients.Hence for the corresponding variances we may write

ϡY2=ϡX2+ ϡN2
.

The mean square error in a subband may be approximated by the corresponding Bayesian squared error risk with the generalized Gaussian as the prior. The threshold which minimizes the Bayesian risk (is nearly optimal) is produced as

ϒ=ϡN2ϡX,

These parameters are estimated from the data for each subband. The noise variance ϡN is estimated via (23), and by exploiting (26), the variance of the noise-free signal ϡXis estimated as

ϡ̠X2=maxmaxϡ̠Y2-ϡ̠N2,0,

where the estimate of the observed coefficients variance is computed from each subband given as

ϡ̠Y2=1Ii=1IYi2,

while assuming the zero mean.

3.2.2. Wavelet coefficients thresholding for the non-Gaussian noise

In case that the noise analysis identifies noise to be non-Gaussian and the moment equation systems for GMM and GLM models are not well satisfied, the thresholding method must be modified. The threshold value is usually derived for the noise with the Gaussian distribution. In accordance with the outcomes of the noise analysis we proposed a simple equation for the evaluation of the threshold value of the GLM. For the universal threshold from (24), the threshold value ϒ is given by the weighted standard deviation of a given distribution(e.g. parameter ϡ for the Gaussian distribution). When considering the GLM from (13) with Ϛ=0 in the wavelet domain, the thresholding value is given by the square root of second central moment

ϒ=wm2(s,ϛ)=ws2α3ϛα1ϛ=w1Ii=1INi2

where Ni denotes the wavelet coefficients of noise n, w denotes the weight, which can be approximately in the range between 1 to 6 and it should be optimized for the acquired data.

Now let us consider the DTCWT. In case of complex-valued coefficients, we threshold the magnitudes while keeping the phase unchangedas described in (22). It is evident that the PDF of the wavelet coefficients magnitudes is asymmetric, and its mean is not zero. We consider the real and imaginary components of the complex coefficients to be i.i.d. Gaussian. Hence, the magnitude is Rayleigh-distributed. The Rayleigh PDF is given by

pN(N;ϡ)=Nϡ2e-N22ϡ2,N0
media/image9.jpeg

Figure 9.

9.A selection of the intensity profile of a selected CT slice presenting the noisy image (D(n) = 14.42) (a), the result of denoising using the UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters at level 1 and qshift filters beyond level 1) (d)

where ϡ>0. Similarly as in (30), we define the threshold value as the weighted square root of second raw moment

ϒ=wm2(ϡ)=w2ϡ2α(2),

where parameter ϡ can be estimated using maximum likelihood as follows

ϡ̠=12Ii=1I|N|i2,

where N are the complex wavelet coefficients of noise n. It is worthwhile to mention that we use the second raw moment m2(ϡ)=2ϡ2α(2) instead of second central moment m2(ϡ)=4-Ϟ2ϡ2 for the threshold value evaluation. The reason is that the optimal value of ϒ is found by changing the weight w and the mentioned two moments are both a function of parameter ϡ. The example of the estimated PDFs computed from the magnitudes of the complex wavelet coefficients at decomposition level 1 are depicted in Fig. 10. The threshold value evaluated for this case is ϒ=3.4. Hence, only a negligible part of PDF of Y are thresholded.

media/image10.jpeg

Figure 10.

The PDFs of the complex wavelet coefficients magnitudes for the noisy observation (a) and the noise (b) including the noise histogram

media/image11.jpeg

Figure 11.

Denoising using soft thresholding of a selected CT slice presenting the noisy image slice (D(n) = 14.42) (a), the result of denoising usingthe UDWT (D(n) = 0.69; Daubechies 6) (b), using the DWT (D(n) = 1.11; Le Gall biorthogonal filters) (c), and using the DTCWT (D(n) = 0.94; Le Gall biorthogonal filters and qshift filters)(d)

A selected CT slice from our image database thresholded assuming a non-Gaussian noise by using various wavelet transforms is depicted in Fig. 11. The results appear similar, except that in case of the DWT, the image is slightly over-smoothed.Fig. 9 shows the sameimage using theintensity profiles, whose sample variancesD(n)=1I-1i=1I(ni-E(n))2considerably decreased for all three implemented wavelet transforms.

Additionally, we produced another set of denoising results using the BayesShrink method described in subsection 3.2.1. As depicted in Fig. 12, we performed BayesShrink using the DTCWT and the DWT. In case of the DTCWT, we used equation (33) for computing the parameter σ of the Rayleigh distribution and also the second central moment for both the observed signal and the noise extracted from the background patches. Similarly for the DWT, we assumed the GLM and evaluated its parameters both for the noise model and the observation in the wavelet doman. In this experiment, the BayesShrink method produced a too small threshold for the DWT. For the DTCWT, the threshold seems appropriate, since the difference image appears to contain primarily noise.

media/image12.jpeg

Figure 12.

The results of the BayesShrink method using the non-Gaussian noise models and soft thresholding presenting the noisy image (a), the result of using the DTCWT (b) and the DWT (c) and the corresponding difference images of D(n) = 2.87 (d) and D(n) = 0.6 (e) of the normalized intensities between [-10; 10]

3.3. Statistical modelling of the wavelet coefficients

The. other broad family of wavelet-based noise reduction methods is based on statistical modelling of the wavelet coefficients. As demonstrated in numerous publications (Romberg et al., 2001), these methods usually outperform the thresholding techniques with respect to the result quality. On the other hand, they also yield greater computational complexity. In this subsection, we discuss two different methods: the hidden Markov trees (HMT) and two types of the marginal probabilistic models discussed above - the GMM and the GLM.

3.3.1. Bayesian estimator

The probabilistic methods utilize the Bayesian estimator to estimate the underlying signal from its noisy observation based on the a priori information. There are two basic variants of this estimator, depending on whether the estimator is designed to optimize the minimum mean square error(MMSE) or the maximum a posterior(MAP) risk function.

Once again, we assume the additive noise model of the noisy wavelet coefficients observations in (25).The conditional mean of the posterior PDF pX|Y(x|y)producesthe least square estimation ofX.The MMSEestimator (Simoncelli & Adelson, 1996)is given as

X̠Y=-+pX|Yxyxdx= -+pY|XyxpXxxdx-+pY|XyxpXxdx=-+pNy-xpXxxdx-+pNy-xpXxdx,

where pY|X(y|x) denotes a likelihood function, pX(x) represents the a priori model, and pN(x) stands for the noise model.

The MAP estimator is given by the following formula

X̠Y=argminxminxpX|Y(x|y)= argargminxminxpY|XyxpXx =argminxminxpN(y-x)pX(x).

Bayesian statistics represent a powerful signal estimation tool (Rowe, 2003). In contrast to the classical Fisher approach, which exploits only the observed data, the Bayesian approach allows subsuming the prior information into the solution and thus produces useful results even for small datasets.

3.3.2. Wavelet-based a priorimodels

The above described marginal models (the GMM and the GLM) are suitable for modelling noise in tBayesian estimators, as discussed in subsections 3.1.2 and 3.1.1, and also as the a priori model of the noise-free signal. Assuming additive noise in the wavelet domain from (25), the observed coefficients Y are given as a summation of two GMM distributions. Hence, its second and fourth theoretical central moments are given by

m2(Y)=m2(X)+m2(N),
m4(Y)=m4(X)+6 m2(X) m2(N)+m4(N)
,

where the moments of Y and Nare evaluated using the sample moments. Thekthcentral sample moments of a random variable R is given by

MkR=1Ii=1I(Ri-ER)k.

From(36) and (37),it is possible to compute themoments m2X andm4(X) of the useful signalX. Finally,the signal parameters may be estimated using the same equations as for the noiseN(i.e. (18) and (19)).The parameters estimation results highly depend on the estimation quality of the sample moments.

For the wavelet-based GLM, equations (36) and (37) are also valid.The GLM of the noise-free signal is given as

pXX;ϓ,ϙ=e-XϙϓZϙ,ϓ, X(-;)

where ϓ denotes the shape parameter and ϙ is the variance parameter. Similarly to noise estimation in (15), the parameters of the signal may be easily estimated through the kurtosis

ϘX=m4(X)m22(X)=α5ϓα1ϓα23ϓ.

Using equations (36) and (37),we obtain

ϘX=m4(Y)-m4(N)-6m2(N)(m2(Y)-m2(N))(m2(Y)-m2(N))2.

And from the second central moment we derive

ϙ=(m2(Y)-m2(N))α1ϓα3ϓ.

Again, the values of the moments for Y and Nmay be estimated from the data using the sample moments exploiting (38).

Noise suppression methods based on Bayesian estimation are generally more efficient than thethresholding methods, mainly in case of considerable noise contamination. Fig. 13 shows the Shepp-Logan phantom contaminated by generalized Laplacian noise (s=30.0, ϛ=1.78, Ϛ=0) and subsequently denoised by the MMSE estimator. The quality improvement after denoising was assessed using Root Mean Square Error (RMSE)

RMSE=1Ii=1I(xi-xi̠)2,

which was computed for the original image x and the denoised imagex̠ and also for the the original image x and the noisy imagey. The generalized Laplacian noise was generated by high-pass convolution filtering of a real-world image.

The described Bayesian estimation represents a powerful tool for image denoising. However, we presume that one of the following factors is in conflict with solvability of the denoising task: the analyzed noise isnot strictly Additive and/or signal-independent, or the derived equation systems for the GMM and the GLM (originally derived for astronomical data (Švihlík & Páta, 2008)) are not well satisfied.

3.3.3. Wavelet-based hidden Markov trees

Wavelet coefficients thresholding methods assume the wavelet transform to de-correlate the signal thoroughly. However, this is not a correct assumption. The wavelet coefficients are interrelated and exhibit persistence across scale and clustering within scale. Both these properties are captured by the hidden Markov tree (HMT) models (Baraniuk, 1999). Acording to the persistency property, the coefficient values propagate across scale from parent to child within the tree. This means that a large parent coefficient corresponding to a signal singularity should have large children coefficients, while a parent coefficient associated with a noise-related singularities should not. Clustering within scale signifies that a large (small) coefficient value is expected in the vicinity of a large (small) coefficient.

media/image13.jpeg

Figure 13.

The Shepp-Logan phantom with added noise (RMSE = 22.7) (a) and the denoised image using the UDWT with the Daubechies6 wavelet(RMSE = 5.5) (b)and the respective intensity profiles (c) and (d)

media/image14.jpeg

Figure 14.

The wavelet-based HMT hierarchy for 2-dimensional DWT, where each parent node p(i) has four children i

As depicted in Fig. 14, the HMT connects the hidden states of a child node Si and a parent node Sp(i) and not the actual coefficients values Yi, Yp(i) associated with these states. For modelling the inter-scale dependencies, the HMT uses an M-component mixture of conditional Gaussian distributions N(Ϛi,m,ϡi,m2) associated with hidden states Si=m, since the PDF of the wavelet coefficients is peaky and heavy-tailed. The overall PDF is given by

fYi=pSi=mfYiSi=m)

where pSi=m is the probability mass function (PMF) of the hidden state Si of the node i satisfying m=1MpSi=m=1 and fYiSi=m) is the conditional probability that the observed coefficients value yi given the state Si corresponds to G(Ϛi,m,ϡi,m2). For simplicity, we use the mixture of two Gaussians (M = 2).

The intra-scale dependencies are captured by the transition probabilities. For M=2, the transition probability matrix connecting the children hidden states Si given the parent state Sp(i) is given as

fSi=m|Sp(i)=n=fSi=1|Sp(i)=1fSi=1|Sp(i)=2fSi=2|Sp(i)=1fSi=2|Sp(i)=2.

The persistence property implies that f1,1 f2,1, f2,2 f1,2.

The HMT are used for denoising in the following fashion (Crouse et al., 1998). First, the model is fitted to the noise observation coefficients using the expectation maximization (EM) algorithm. This training algorithm comprises the E-step, in which the state information propagates upwards and downwards through the tree, and the M-step, in which the model parameters ϖ are recalculated and then input into the next iteration. We do not have multiple realizations of the same process. Hence, to prevent over-fitting of the model to the data, we tie the tries within subbands so as to obtain 3 independent HMT models for the 3 subbands of the 2-dimensional DWT. The results of model fitting for the first decomposition level of the DWT are shown in Fig. 15.

media/image15.jpg

Figure 15.

The results of the HMT training for the CT image in Fig. 16 presenting histograms of the detail coefficients subbands from level 1 LH1 (a), HL1 (b), and HH1 (c), respectively and the GMM of conditional probabilities

Second, the trained model is exploited as the a priori signal PDF in order to compute the conditional mean estimates of the noisy observations given the noise-free signal coefficients.

Again, we assume the additive nose model in (25), where N is the independent identically distributed Gaussian noise. The conditional mean estimate of Xi, given the observed coefficients Y and the HMT model parameters ϖis given by

EXiY,ϖ=m=1MpSi=mϡi,m2ϡn2+ϡi,m2Yi

where p(Si=m) and ϡi,m are obtained from the HMT model and ϡn is the noise standard deviation which may be estimated from the background patch or using the MAD in (23).

As example of HMT-based image denoising is displayed in Fig. 16. The wavelet-basedHMTs (Romberg et al., 2001) capture dependencies between the wavelet coefficients across scales, since image singularities, such as edges, persist across scales, while the noise-dominated coefficients lack such intra-scale consistence. This approach yields very good denoising results, however requires a computationally expensive training.

media/image16.jpeg

Figure 16.

2 Denoising a selected CT image using the 2-level DWT with the LeGall biorthogonal filters depicting the observed image (a), its denoised equivalent, and the difference image (c) of the intensity range [-10,10]

4. Conclusion

In this chapter, we described different types of the wavelet transform including both non-redundant (the DWT) and redundant representations (the DTCWT and the UDWT). In general, the DTCWT and the UDWT outperform the DWT due to their shift-invariance; however, at the expense of redundancy. The DTCWT is less redundant than the UDWT and provides better directional selectivity.

We carried out preliminary noise analysis on 2 CT-datasets. The noise component in CT images is very complex and, presumably, also non-stationary. However, for simplicity, we assumed that the noise is stationary within the image and used the additive noise model. In order to model the noise, we selected a few background patches and modelled their distributions using the generalized Laplacian model or the Gaussian mixture model. The noise models were then exploited for wavelet coefficients thresholding. This required deriving threshold estimation equations corresponding to these models and also for the assumed Rayleigh distribution of the DTCWT coefficients magnitudes. The denoising results by using each of the three described wavelet transform were demonstrated on the CT-image data.

Since the threshold-based methods tend to blur edges and cause artefacts (the extent of this problem depends based on the used wavelet transform and thresholding method), we decided to implement the probabilistic methods. In general, these methods outperform the thresholding methods in the resulting image quality (mainly in case of considerable noise contamination); however, at the expense of greater computation cost. The described Bayesian estimation represents a powerful tool for image denoising. However, we presume that one of the following factors is in conflict with solvability of the denoising task: the analyzed noise is not strictly additive and/or signal-independent, or the derived equation systems for the GMM and the GLM are not well satisfied. In this perspective, the thresholding methods due to their relative simplicity revealed greater robustness towards non-fulfilment of the assumptions of the noise characteristics. On the other hand, the hidden Markov trees (HMT) of wavelet coefficients performed well and produced good quality image results with reduced noise and unblurred edges.

In future work, we shall focus on experiments in a larger scale. We shall carry out a thorough noise analysis on more datasets (on different CT-scanners if applicable) and also compare different denoising methods both by interviewing medical experts and by designing and evaluating appropriate quality metrics. Regarding methods development, we will further focus on the probabilistic methodsexperimenting with the use of more than two components in the Gaussian mixture model, the DTCWT instead of the DWT, and possibly also volumetric denoising techniques.

Acknowledgements

This work was funded by the research grant MSM 6046137306 of the Ministry of Education, Youth and Sports of the Czech Republic.

References