## 1. Introduction

Image denoising represents a crucial initial step in biomedical image processing and analysis. Denoising belongs to the family of image enhancement methods (Bovik, 2009) which comprise also blur reduction, resolution enhancement, artefacts suppression, and edge enhancement. The motivation for enhancing the biomedical image quality is twofold. First, improving the visual quality may yield more accurate medical diagnostics, and second,analytical methods, such as segmentation and content recognition, require image preprocessing on the input.

Gradually, noise reduction methods developed in other research fields find their usage in biomedical applications. However, biomedical images, such as images obtained from computed tomography (CT) scanners, are quite specific. Modelling noise based on the first principles of image acquisition and transmissionis a too complex task (Borsdorf et al., 2009), and moreover, the noise component characteristics depend on the measurement conditions (Bovik, 2009).

Additionally, noise reduction must be carried out with extreme care to avoidsuppression of the important image content. For this reason, the results of biomedical image denoising should be consulted with medical experts.

There is a range of noise reduction techniques applied to images either in the spatial domain or in a selected transform domain (Motwani et al., 2004). The former include linear or nonlinear low-pass filters, such as moving average filters, the Wiener filter or a variety of median filters. Left alone some recent developments of weighted median, the space domain techniques generally remove the majority of noise, but on the other hand, also blur sharp image edges. To overcome these problems, it is advantageous to use other domains for image representation and processing.

An optimal representation should capture key features of the signal in a relatively small number of large transform coefficients while the majority of the coefficients should be very small or zero. In other words, the representation should be sparse( Mallat, 2009) . In this respect, the wavelet domain is a good choice. Both for signals with spatial transients and narrow-band frequency content, the wavelet transform represents a good compromise between the frequency and the spatial domain representations. Furthermore, for real-world images comprising both spatial transients such as edges and narrow-band components such as regular texture regions, the wavelet transform with the space-scale (or space-frequency) representation outperforms the other two .

Aiming at reducing noise in CT images, researchers focus on optimizing the filtered back-projection (FBP) algorithm used in CT-scanners for image reconstruction. For instance, You & Zeng (2007) propose an improved FBP algorithm, which incorporates the Hilbert transform, and Zhu & StarLack (2007) propose a method for predicting the noise variance and subsequently adapting weights and kernels of the FBP algorithm. Similar to other researches, they exploit a simple Shepp-Logan phantom image which is depicted in Fig. 1c. In their experiments, noise with various statistical properties is introduced into the projection data, and then the result reconstructed by the reconstruction algorithm is evaluated.

Some researchers study the possibilities of the wavelet transform in biomedical image denoising and compression. They most commonly use the critically sampled discrete wavelet transform (DWT). Khare & Shanker Tiwary(2005) utilise the dual-tree complex wavelet transform (DTCWT) for adaptive shrinkage. Bosdorf et al. (2008) propose a wavelet-based correlation analysis method applied to pairs of disjoint projections in dual-source CT-scanners in oder to extract and eliminate uncorrelated noise. The authors use both the DWT and undecimated DWT (UDWT) and in conclusion, indicate their preference for the computation efficiency of the DWT over the slightly better results quality produced by the UDWT. Wang & Huang (1996) and Wu & Qiu (2005) study the use of volumetric processing of biomedical image slices and its advantages in comparison with slice-by-slice processing.

In this chapter, we shall focus onwavelet-based techniques for noise reduction in image volumes produced by a standard CT-scanner (see Fig. 1). First, we shalldescribe the wavelet transform focusing on the DWT, the UDWT, the DTCWT, and their one-dimensional and multi-dimensional implementations. Second, we shall carry out the noise analysis on selected CT data sets. Final, we shalldeal withwavelet-based denoising methods comprising wavelet coefficient thresholding methods (VisuShrink, SureShrink, and BayesShrink) and statistical modelling methods (the hidden Markov trees, the Gaussian mixture model, and the generalized Laplacian model) utilizing the Bayesian estimator.

## 2. Wavelet transform

The wavelet transform analyzes signals at multiple scales by changing the width of the analysis window, and produces their scale-space representation (Mallat, 2009). In contrast to the short-time Fourier transform, the wavelet transform deals with the limitations ofuncertainty principle in a way that is more convenient for most real-world signals. This principle definesthe trade-off between the length of the slidingwindow (i.e. the spatial resolution) and the distance between adjacent spectral lines (i.e. the frequency resolution). The wavelet transform exploits longer windows (i.e. better frequency resolution) for lowfrequency componentsof the signal, which are, nonetheless, usually of long duration, and shorter windows (i.e. better spatial resolution) for high frequency components of short duration. An important aspect of wavelet-based denoising is transform selection. In the following subsections, we shall discus the critically sampled DWT and two examples of a redundant wavelet transform with improved properties.

### 2.1. Discrete wavelet transform

The DWT is probably the most popular type of the wavelet transform in the signal and image processing field. This transform has become successfulprimarily in compression,as it became part of the JPEG2000 standard. This transform is computed via thesubband coding algorithm designed by. As displayed in Fig. 2, at each decomposition stage, the transform produces detail coefficients and approximation coefficients corresponding respectively to the upper half and the lower half of the input signal spectrum. The approximations then become an input of the next level.

The detail coefficients *j* are produced throughconvolution of the approximations *j=1*) with the band-pass filter

Similarly, the low frequency approximations *j* are produced by the convolution with the low-pass filter

Due to down-sampling, the approximation vector is rescaled prior to entering the next decomposition level, while the filter taps are preserved unchanged.

We may also synthesize the original input signal from the decomposition coefficients. The inverse DWT is given as

where

The decomposition and reconstruction filter banks are designed as orthogonal or bi-orthogonal (or dual) bases . The limitation of the orthogonal solution is that the associated real-valued wavelet function cannot have compact support and be symmetrical at the same time, except the Haar wavelet. However, the Haar wavelet is not smooth. Bi-orthogonal solutions provide more construction freedom and allow for smoothness and symmetry. Symmetrical filters have a linear phase response, and hence do not cause phase distortion to which the human eye is particularly sensitive. Consequently, bi-orthogonal wavelets are nowadays probably the most widely used.

The DWT is the most computationally efficient of all wavelet transforms. However, critical sampling causes significant drawbacks. This transform lacks shift-invariance, zero-crossings often appear at the locations of signal singularities, and altering wavelet coefficients (for instance, during denoising) causes artefacts in reconstructed images. To overcome these problems, we may use a redundant wavelet representation.

### 2.2. Redundant wavelet transforms

In this section, we describe two redundant wavelet transforms: the undecimated discrete wavelet transform (UDWT) and the dual-tree complex wavelet transform (DTCWT) designed by Kingsbury & Selesnick (Selesnick et al., 2005).The former is calculated through the subband coding algorithm exploitingthe same filters as the DWT. The distinction lies in omitting the down-sampling step in decomposition. Instead, the decomposition filters are up-sampled at each stage (Starck et al., 2007). As a result, the UDWT is shift invariant which means that a shift in the input signal corresponds to a shift in the transform output and does not cause any other changes in the coefficient values. On the other hand, leaving out the down-sampling step yields a significant computation burden. The redundancy of this transform depends on the number of decomposition levels *d* denotes the number of dimensions. As a result, for 2-dimensional decomposition, the redundancy of this transform with respect to the DWT is 4:1 at stage 1, increases to 7:1 at stage 2, further increases to 10:1 at stage 3, etc.

In comparison with the UDWT, the DTCWT exhibits relatively moderate redundancy, which is 2^{d}:1 with respect to the DWT. In contrast to the UDWT, this ratio does not increase with the number of stages. To illustrate the redundancy, the DTCWT produces twice as many coefficient values than the DWT in 1-dimensionalspace (i.e. 2 subbands of complex coefficients composed of the real and the imaginary parts).

The DTCWT is realized with FIR (finite impulse response) filters forming a tight frame, and is thus easily revertible (unlike, for instance, the Gabor transform). As displayed in Fig. 3, this transform employs two DWT-like trees a and b producing respectively the real parts

It may seem surprising that a real signal is converted into the complex wavelet representation by using real-valued filters. This is possible thanks to the Hilbert transform built into each transform stage. Ideally, the complex scaling function

where sign denotes the sign function and *ω* the angular frequency. (The same relation applies also to the scaling function). As a consequence, the spectrum of the analytic wavelet is single-sided with zero magnitudes for negative frequencies. This property directly implies shift invariance, no aliasing, and the ability to isolate singularities of positive and negative directions in higher dimensions.

As the scaling and the wavelet function should be analytic, the filters associated with the real and the imaginary part of these functions must be delayed from each other by half a sample period

_{,}

where *a* and tree *b,* respectively. However, exact analycity cannot be achieved by functions with compact support. In other words, the Hilbert transformer is of infinite length and may not be exactly implemented with an FIR filter. Consequently, the wavelet and the scaling filters used in the DTCWT are only approximately analytic and shift-independent, and approximately fulfil condition (5).

In this chapter, we use the q-shift filter solution for the DTCWT by (Kingsbury, 2003). To achieve a higher degree of analycity at lower decomposition levels, this solution exploitsdifferent filters at level 1 than at higher levels. For level 1, it is possible to use the same filter set for both trees as long as it provides perfect reconstruction (for instance, one of the biorthogonal filter sets designed for the DWT). The only difference is that the filters in one tree are translated by one sample from the corresponding filters in the other tree

and also each to the other within the same tree (as indicated by the value in brackets in Fig. 3). Beyond level 1, we employ the approximately analytic q-shift filters, for which the low-pass (and also the high-pass) filters from opposite trees are time-reversed versions of each other

_{.}

A similar relation applies also to the analysis and synthesis filters. These filters are of even length and approximately symmetrical, as their point of symmetry is ¼-sample away (i.e. q-shifted) from the centre. Hence, the filters exhibit a q-sample group delay and their individual phase response is not exactly linear. On the other hand, the assymmetry makes the orthonormal perfect reconstruction feasible. For the overall complex wavelet and also the scaling function, the conjugate phase response is exactly linear and the magnitude is approximately shift-invariant. In addition to the shift-invariance property, theDTCWT provides better directional selectivity than both the DWT and UDWT in multiple dimensions.

### 2.3. Multidimensional wavelet transform

Computing the multidimensional wavelet transform is straightforward due to its separability. This property implies that the *n*-dimensional (*n*D) transform may be implemented as *n* consecutive 1D transform in different directions as illustrated in Fig. 4 for the DWT. For *n=3*, we may proceed for instance in the following order. First, each slice of the image volume is processed in the row direction resulting into the low frequency coefficients (i.e. the approximations *c*_{L}) and the high frequency coefficients (i.e. the details *c*_{H}). The resulting 1D decomposed slices are then processed in the column directionyielding the 2D transform of 4 coefficients subbands (*c*_{LL}, *c*_{LH}, *c*_{HL}, and *c*_{HH}) for each mage slice. Finally, the set of the 2D transform coefficients matrices is processed in the between-slice direction producing 8 subbands (*c*_{LLL},*c*_{LLH},*c*_{HLL}, …, and *c*_{HHH}). The coefficients *c*_{LLL} constitute an input to the next level of the 3D transform .

Please note that for the UDWT, the procedure is identical except that the size of the *3*-dimensional cube depicted in Fig. 4 doubles its size in the respective direction at each step. For the DTCWT, the multidimensional decomposition procedure is similar except producing a different number of subbands. For instance, the 2D DTCWT produces 8 subbands of complex-valued coefficients which correspond to 4:1 redundancy w.r.t. the 2D DWT of 4 subbands of real-valued coefficients. Despite being separable, the 2D DTCWT is truly directional. Its 6 directional subbands separate positive and negative singularity orientations(-75°, -45°, -15°, +15°, +45°, +75°). In contrast, the separable DWT mixes the negative and positive orientations together in its 3 subbands (0°, ±45°,±90°). The improved directional selectivity in higher dimensions represents another advantage over both the criticallysampled DWT and UDWT.

The volumetric wavelet transform does not necessarily need be uniform in all three directions. We may for instance use longer filters within the slices and shorter filters in the direction between slices . The rationale behind this lies in the variance of the spatial resolution in different directions. In the between-slice direction, the resolution is coarser than in the intra-slice directions (depending on the slice thickness and spacing).

## 3. Wavelet-based denoising methods

As proved in a range of signal processing research areas, the wavelet transform is a suitable representation for estimating noise free images from their noisy observations owing to its sparsity and multiscale nature.As outlined in Fig. 6, denoising is based on image

transformation into the wavelet domain and subsequent reconstruction from the altered detail coefficients and unchanged approximation coefficients from the last decomposition level. This procedure is called wavelet shrinkage and is associated with two main-stream approaches, which are discussed in this section: coefficient thresholding and probabilistic coefficient modelling.

### 3.1. Noise analysis

The quality of CT images depends directly on the radiation dose from an examination. The dose is influenced by the following quantities : X-ray tube current, exposure time, beam energy, slice thickness, table speed, type of the reconstruction algorithm, focal-spot-to-isocenter distance, detector efficiency, etc. Overall, CT image acquisition is a complex process affected many factors, such as the post-processing algorithm and nonlinearities of several parts of the device. By minimizing the radiation exposure of the patient, the amount of noise and artefacts increases.

Noise analysis represents a fundamental initial step of advanced noise suppression algorithms. In common devices, such as cameras with CCD (charged coupled device) or CMOS (complementary metal-oxide-semiconductor) sensors, noise analysis is based on acquisition of a testing pattern, which contains several patches with constant greyscale levels ranging from black to white . However, CT image volumes are a specific case and more realizations of the same acquisition process cannot be obtained so simply. One option would be to use phantoms (for instance water phantoms), which mimic the consistence of the human body (basically - bone, tissue, water, and air). It is then possible to analyze the noise component by using several images generated by scanning the phantom. Another option of obtaining the data for noise analysis would be to acquire the same volumetric slice twice during a regular examination. However, this would increase the radiation dose for the patient and the noise would be still impacted by motion-generated artefacts (e.g. from breathing). To prevent from any motion, the patient head would have to be tightly fixed which is not applicable.

New interesting possibilities arise with introduction of multi-detector scanners. Bosdorf et al. (2008) analyse noise in images from the latest generation dual-source CT-scanners, which produce two images of the same slice (one from each detector). The authors propose to extract the uncorrelated noise component via wavelet-based correlation analysis of the corresponding images. According to their findings, the noise in CT images is non-white, usually of an unknown distribution, and not stationary.

In our experiments, we analyzed images from a standard CT-scanner and without possessing the corresponding phantom images for noise analysis. In order to analyze noise characteristics, we selected patches of the background, i.e. the part of the image capturing no part of the patient’s head or the bed as apparent from Fig. 7b with an altered histogram. Nevertheless, this type of analysis does not reveal whether the noise is signal-dependent or not.

Even thoughwe are aware that the noise is not strictly additive, we assume the additive noise model

for the sake of simplicity (y denotes the observed signal, x the noise-free signal, and *n* independent noise).We analyze the noise for each image of the volume individually, since the noise variance changes considerably from slice to slice. Fig. 7a shows a typical intensity profile of a CT image.

As we demonstrate below, the noise obtained from the background patches as well as the observed signal may be modelledusing the GLM (generalized Laplacian model) or the GMM (Gaussian mixture model) in the spatial domain. The parameters of these models may be estimated exploiting the method of moments. This method is based on comparing the sample moments with the theoretical moments. Let us consider samples^{th} sample moment

and the theoretical moment

where*p(y)*denotes the probability density function of *y*.Theparameters

#### 3.1.1 Generalized Laplacianmodel of the noise

Every background patch is analyzed in the following way. We compute an optimized histogram to obtain the PDF (probability density function) of the noise, which is then tested for normality. The quality of the histogram shape depends greatly on the bin width

where the term in the parentheses denotes a so-called interquartile range between the 75th percentile n_(0.75) and the 25th percentile n_(0.25). In our experiments, the Kolmogorov-Smirnov test rejected the null hypothesis that the samples come from the Gaussian distribution at the significance level

Hence, it is necessary to employ a more general model. We choose a model with heavytails given by

where *μ* denotes the mean value, the parameter *n* with *μ* = 0*.* The second and the fourth moment of the noise *n* are given as

As proposed by Simoncelli & Adelson (1996), parameters estimation may be simplified using the kurtosis

The above described model is widely known as the generalized Laplacian model (GLM). This model is commonly used to model filtered images, such as the wavelet coefficients of high frequency bands. The histogram and the modelled PDF for a selected background patchare depicted in Fig. 8a using the logarithmic scale, which clearly illustrates the quality of the fit.

For denoising, we transform images into the wavelet domain. In case of the Gaussian-distributed noise, the noise parameters are preserved unaffected by the transformation. In contrast, forthe non-Gaussian noise, the parameters change and thusneed be re-estimated.We may either estimate the parameters directly in the wavelet domain (e.g. via the method of moments), or alternatively,transform the moments into the wavelet domain.

#### 3.1.2. Gaussian mixturemodel of the noise

Another possibility of modelling noise in the selected background patches is the Gaussian mixture model (GMM). This model is generally given by a mixture of a certain number of Gaussians with the variances*σ*_{kn} and the mean values*μ*_{kn}

where*α*_{kn} are the proportions of the mixturesatisfying the constraint *K* = 2. The GMM is than given by

where the mean values*μ*_{kn} are assumed to equal zero.

To estimate the model parameters in the wavelet domain, we may use the system of moment equations employing the second and the fourth central moment derived in (Švihlík, 2009). The noise parameters (the variances*n*} as follows.First, the variance

where

Using the logarithmic scale, Fig. 8b displays the result of fitting the model to the histogram of the selected background patch.

## 3.2 Wavelet coefficient thresholding

The method of wavelet coefficients thresholding is based on suppressing low-energy detail coefficients which are presumed to noise-dominated. To do this, we may choose different thresholding functions. The basic two thresholding function types are the hard thresholding and the soft thresholding function. For the former, the coefficients generated by alteringthe observed real-valued coefficients

where

where

Hard thresholding preserves the coefficient with the values greater than the threshold. This may, however, introduce discontinuities in the coefficients values, which may result in artefacts. In contrast, the soft thresholding function does not introduce artefacts owing to being continuous. This function complies with the assumption that noise is distributed evenly in all coefficients. However, when this is not the case, this technique reduces also the values of the coefficients corresponding to the underlying noise-free signal, which results in edge blurring .

### 3.2.1. Thresholdestimationmethods for the Gaussian noise

Threshold estimation methods vary in the assumption of the noise variance uniformity for different scales and subbands and in the threshold value estimation methods which they employ. In general, orthonormal transforms preserve the statistics of the i.i.d. (independent identically distributed)Gaussian noise. That is the reason why the most widely used methods are derived with this assumption.

The VisuShrink methodassumes the noise variance the same for all thresholded subbands. The noise variance is estimated using the median absolute deviation (MAD) of the coefficients from the highest-frequency subband which is presumed to be noise-dominated.

_{,}

where the constant in the denominator corresponds to the Gaussian distribution. The primary advantage of this variance estimation method is its robustness to outliers. The estimated noise varianceis than exploited for computing the universal threshold with the same value for all levels and subbands

_{,}

where *L* is the number of signal samples and log is the natural logarithm. This threshold computation formula is derived by minimizing the probability that any noise sample will exceed the threshold limit. The resulting threshold is applied to the wavelet coefficients via the soft thresholding technique defined in (21). VisuShrink removes the vast majority of noise from the image, but also tends to over-smooth the image since common signals are not sparse enough to comply with the minmax theory. In response to these findings, Donoho & Johnstone proposed another method called SureShrink (Stein’s Unbiased Risk Estimate Shrinkage), which produces subband-adaptive thresholds and is optimal in the mean-squared error sense. They further proposed a hybrid scheme combining the above approaches, since SureShrink does not perform well for situations of extreme sparsity.

In literature, SureShrink is often compared with BayesShrink for different types of data and the two methods. BayesShrinkderives the threshold within the Bayesian framework assuming a generalized Gaussian distribution of the wavelet coefficients and the additive noise model of the observed coefficients

where *N* = WT{*n*} denotesthe noise coefficients and*X = WT{x}*the noise-freesignal coefficients.Hence for the corresponding variances we may write

_{.}

The mean square error in a subband may be approximated by the corresponding Bayesian squared error risk with the generalized Gaussian as the prior. The threshold which minimizes the Bayesian risk (is nearly optimal) is produced as

These parameters are estimated from the data for each subband. The noise variance

where the estimate of the observed coefficients variance is computed from each subband given as

while assuming the zero mean.

### 3.2.2. Wavelet coefficients thresholding for the non-Gaussian noise

In case that the noise analysis identifies noise to be non-Gaussian and the moment equation systems for GMM and GLM models are not well satisfied, the thresholding method must be modified. The threshold value is usually derived for the noise with the Gaussian distribution. In accordance with the outcomes of the noise analysis we proposed a simple equation for the evaluation of the threshold value of the GLM. For the universal threshold from (24), the threshold value

where

Now let us consider the DTCWT. In case of complex-valued coefficients, we threshold the magnitudes while keeping the phase unchangedas described in (22). It is evident that the PDF of the wavelet coefficients magnitudes is asymmetric, and its mean is not zero. We consider the real and imaginary components of the complex coefficients to be i.i.d. Gaussian. Hence, the magnitude is Rayleigh-distributed. The Rayleigh PDF is given by

where

where parameter

where

A selected CT slice from our image database thresholded assuming a non-Gaussian noise by using various wavelet transforms is depicted in Fig. 11. The results appear similar, except that in case of the DWT, the image is slightly over-smoothed.Fig. 9 shows the sameimage using theintensity profiles, whose sample variances

Additionally, we produced another set of denoising results using the BayesShrink method described in subsection 3.2.1. As depicted in Fig. 12, we performed BayesShrink using the DTCWT and the DWT. In case of the DTCWT, we used equation (33) for computing the parameter *σ* of the Rayleigh distribution and also the second central moment for both the observed signal and the noise extracted from the background patches. Similarly for the DWT, we assumed the GLM and evaluated its parameters both for the noise model and the observation in the wavelet doman. In this experiment, the BayesShrink method produced a too small threshold for the DWT. For the DTCWT, the threshold seems appropriate, since the difference image appears to contain primarily noise.

### 3.3. Statistical modelling of the wavelet coefficients

The. other broad family of wavelet-based noise reduction methods is based on statistical modelling of the wavelet coefficients. As demonstrated in numerous publications (Romberg et al., 2001), these methods usually outperform the thresholding techniques with respect to the result quality. On the other hand, they also yield greater computational complexity. In this subsection, we discuss two different methods: the hidden Markov trees (HMT) and two types of the marginal probabilistic models discussed above - the GMM and the GLM.

#### 3.3.1. Bayesian estimator

The probabilistic methods utilize the Bayesian estimator to estimate the underlying signal from its noisy observation based on the a priori information. There are two basic variants of this estimator, depending on whether the estimator is designed to optimize the minimum mean square error(MMSE) or the maximum a posterior(MAP) risk function.

Once again, we assume the additive noise model of the noisy wavelet coefficients observations in (25).The conditional mean of the posterior PDF *X*.The MMSEestimator (Simoncelli & Adelson, 1996)is given as

where

The MAP estimator is given by the following formula

Bayesian statistics represent a powerful signal estimation tool (Rowe, 2003). In contrast to the classical Fisher approach, which exploits only the observed data, the Bayesian approach allows subsuming the prior information into the solution and thus produces useful results even for small datasets.

#### 3.3.2. Wavelet-based a priorimodels

The above described marginal models (the GMM and the GLM) are suitable for modelling noise in tBayesian estimators, as discussed in subsections 3.1.2 and 3.1.1, and also as the a priori model of the noise-free signal. Assuming additive noise in the wavelet domain from (25), the observed coefficients *Y* are given as a summation of two GMM distributions. Hence, its second and fourth theoretical central moments are given by

where the moments of *Y* and *N*are evaluated using the sample moments. The*k*^{th}central sample moments of a random variable *R* is given by

From(36) and (37),it is possible to compute themoments *X*. Finally,the signal parameters may be estimated using the same equations as for the noise*N*(i.e. (18) and (19)).The parameters estimation results highly depend on the estimation quality of the sample moments.

For the wavelet-based GLM, equations (36) and (37) are also valid.The GLM of the noise-free signal is given as

where

Using equations (36) and (37),we obtain

And from the second central moment we derive

Again, the values of the moments for *Y* and *N*may be estimated from the data using the sample moments exploiting (38).

Noise suppression methods based on Bayesian estimation are generally more efficient than thethresholding methods, mainly in case of considerable noise contamination. Fig. 13 shows the Shepp-Logan phantom contaminated by generalized Laplacian noise (

which was computed for the original image

The described Bayesian estimation represents a powerful tool for image denoising. However, we presume that one of the following factors is in conflict with solvability of the denoising task: the analyzed noise isnot strictly Additive and/or signal-independent, or the derived equation systems for the GMM and the GLM (originally derived for astronomical data (Švihlík & Páta, 2008)) are not well satisfied.

#### 3.3.3. Wavelet-based hidden Markov trees

Wavelet coefficients thresholding methods assume the wavelet transform to de-correlate the signal thoroughly. However, this is not a correct assumption. The wavelet coefficients are interrelated and exhibit persistence across scale and clustering within scale. Both these properties are captured by the hidden Markov tree (HMT) models (Baraniuk, 1999). Acording to the persistency property, the coefficient values propagate across scale from parent to child within the tree. This means that a large parent coefficient corresponding to a signal singularity should have large children coefficients, while a parent coefficient associated with a noise-related singularities should not. Clustering within scale signifies that a large (small) coefficient value is expected in the vicinity of a large (small) coefficient.

As depicted in Fig. 14, the HMT connects the hidden states of a child node *S*_{i} and a parent node *S*_{p(i)} and not the actual coefficients values *Y*_{i}, *Y*_{p(i)} associated with these states. For modelling the inter-scale dependencies, the HMT uses an *M*-component mixture of conditional Gaussian distributions

where *S*_{i} of the node *i* satisfying *(M* = 2).

The intra-scale dependencies are captured by the transition probabilities. For M=2, the transition probability matrix connecting the children hidden states

The persistence property implies that

The HMT are used for denoising in the following fashion (Crouse et al., 1998). First, the model is fitted to the noise observation coefficients using the expectation maximization (EM) algorithm. This training algorithm comprises the E-step, in which the state information propagates upwards and downwards through the tree, and the M-step, in which the model parameters

Second, the trained model is exploited as the a priori signal PDF in order to compute the conditional mean estimates of the noisy observations given the noise-free signal coefficients.

Again, we assume the additive nose model in (25), where *N* is the independent identically distributed Gaussian noise. The conditional mean estimate of

where

As example of HMT-based image denoising is displayed in Fig. 16. The wavelet-basedHMTs (Romberg et al., 2001) capture dependencies between the wavelet coefficients across scales, since image singularities, such as edges, persist across scales, while the noise-dominated coefficients lack such intra-scale consistence. This approach yields very good denoising results, however requires a computationally expensive training.

## 4. Conclusion

In this chapter, we described different types of the wavelet transform including both non-redundant (the DWT) and redundant representations (the DTCWT and the UDWT). In general, the DTCWT and the UDWT outperform the DWT due to their shift-invariance; however, at the expense of redundancy. The DTCWT is less redundant than the UDWT and provides better directional selectivity.

We carried out preliminary noise analysis on 2 CT-datasets. The noise component in CT images is very complex and, presumably, also non-stationary. However, for simplicity, we assumed that the noise is stationary within the image and used the additive noise model. In order to model the noise, we selected a few background patches and modelled their distributions using the generalized Laplacian model or the Gaussian mixture model. The noise models were then exploited for wavelet coefficients thresholding. This required deriving threshold estimation equations corresponding to these models and also for the assumed Rayleigh distribution of the DTCWT coefficients magnitudes. The denoising results by using each of the three described wavelet transform were demonstrated on the CT-image data.

Since the threshold-based methods tend to blur edges and cause artefacts (the extent of this problem depends based on the used wavelet transform and thresholding method), we decided to implement the probabilistic methods. In general, these methods outperform the thresholding methods in the resulting image quality (mainly in case of considerable noise contamination); however, at the expense of greater computation cost. The described Bayesian estimation represents a powerful tool for image denoising. However, we presume that one of the following factors is in conflict with solvability of the denoising task: the analyzed noise is not strictly additive and/or signal-independent, or the derived equation systems for the GMM and the GLM are not well satisfied. In this perspective, the thresholding methods due to their relative simplicity revealed greater robustness towards non-fulfilment of the assumptions of the noise characteristics. On the other hand, the hidden Markov trees (HMT) of wavelet coefficients performed well and produced good quality image results with reduced noise and unblurred edges.

In future work, we shall focus on experiments in a larger scale. We shall carry out a thorough noise analysis on more datasets (on different CT-scanners if applicable) and also compare different denoising methods both by interviewing medical experts and by designing and evaluating appropriate quality metrics. Regarding methods development, we will further focus on the probabilistic methodsexperimenting with the use of more than two components in the Gaussian mixture model, the DTCWT instead of the DWT, and possibly also volumetric denoising techniques.