## 1. Introduction

The proliferation of SONAR images produced by different equipments: multibeam echo sounders, side scan sonar, forward looking imaging SONAR…, [Lurton, 2002], and the large number of processing methods, created the necessity of expert systems for assisting the decision making process. An example is the SonarScope, an IFREMER expert system [Ifremer, 2010]. The basic functionality of such an expert system is the representation and the analysis of sonar data, organized as a "multilayer" structure defined by its various attributes (bathymetry, image, angles, and raw data from auxiliary sensors …). These data can be represented and processed using various techniques either classical (signal and/or image processing) or specific to SONAR. The goal of an expert system for SONAR images is to achieve three main tasks: Quality Control, Data Processing and Data Interpretation. There are two classes of signal processing techniques used in an expert system for SONAR images. The first one represents the so-called “image conditioning” methods. The first category of conditioning methods refers to the echoes acquisition, meaning the acquisition of depth, of the across and along-track distance, of the received beam angle, of the numbers of the transmitted and received beam, or of the two-way travel time of the acoustic pulse. The echo acquisition has some features like for example the bottom detection mode. The second category of conditioning method refers to the correction of acquired data. There are techniques to complete the missing data (caused by some acquisition imperfections due to unexpected displacements of the platform where the acoustical transducers are mounted or by the non-uniform movement or deviations from the course of the vessel which tracts the platform) and techniques for the correction of the differences between the directivity characteristics of the sensors. The third type of conditioning techniques consists in data organization and management. SONAR data are arranged as a set of arrows called pings (swath), which correspond to all the soundings acquired in a ping cycle. At the beginning of the ping cycle, each sensor's value is logged: gyro, pitch, roll, positions, etc.

The fourth category of conditioning methods refers to the formation of SONAR images. A SONAR image is thus linked with the sensor’s time series, pertaining to that given dataset and in synchronization with each ping. The time series can be plotted as curves, displayed in conjunction with the image. There are few geometric formats used for the SONAR images: PingBeam, LatLong, PingSamples and PingSwat. The fifth category of conditioning methods refers to the assembling of few neighboring individual images. This could be a technique to produce mosaics or digital terrain models (DTM). Finally the sixth conditioning category of methods refers to the angular correction and despecklisation of SONAR images. The last category of image processing methods represents the aim of this chapter. The second class of signal processing methods applied in SONAR expert systems is represented by the so called intelligent image processing techniques: segmentation, textures analysis, classification,…The aim of this chapter is the despecklisation of SONAR images. The SONAR images are perturbed by speckle. It is of multiplicative nature. The aim of a denoising algorithm is to reduce the noise level, while preserving the image features. The noisy image is:

where s is the noise-free input image and *v* the speckle noise. We present the most frequently used despecklisation techniques starting with the classical ones, Lee, Kuan and Frost filters and continuing with the pure statistical despecklisation method proposed in [Walessa&Datcu, 2000] to arrive at the most modern ones which act in the wavelet domain.

## 2. Denoising methods

The field of natural images denoising methods is very large. A lot of articles dedicated to denoising methods were already written. Most of these articles are focused on the case of additive noise.

There are two ways of reducing the speckle to an additive noise. The first one involves the separation into a sum between the signal and a signal-dependent noise:

while the second one simply takes advantage of the properties of the logarithm:

Assuming that usually

### 2.1. Denoising based on differential equations with partial derivatives

A first category of images denoising methods acting in the spatial domain are based on differential equations with partial derivatives. These denoising methods do not take into account any a priori information about the image to be processed, being non parametric. The aim of those methods is to consider the denoising like a decomposition of the acquired image into two components, the noiseless part and the noise. This decomposition can be realized by the projection of the acquired image on two very different vector spaces. The projection can be done by the minimization of a cost function. The result corresponds to the solution of the system of equations which is obtained imposing the zero value to the partial derivatives of the cost function. This is a system of partial differential equations. The simplest projection on the noiseless space is realized averaging the acquired image. The corresponding denoising method works well for the homogeneous regions of the acquired image if the noise is zero mean. Tacking into account the fact that the averager is a low-pass filter, this denoising method distorts the edges and some of the textured regions by oversmoothing.

### 2.2. Denoising by non-local averaging

Another very modern non parametric denoising method is based on non-local (NL) averaging [Buades, 2007]. The NL-means algorithm tries to take advantage of the high degree of redundancy of any natural image. Every small area (or window) in a natural image has many similar areas in the same image. In a very general sense, one can define as „neighborhood of a pixel *i*" any set of pixels *j* in the image so that a window around *j* looks like a window around *i*. All pixels in that neighborhood can be used for predicting the value at *i*. Given a discrete noisy image*i* and *j* and satisfy the usual conditions

### 2.3. Denoising using statistical filters

Another category of denoising methods are the parametric ones. These methods take into account statistical models for the noiseless component of the acquired image and for the noise. One of the best parametric denoising methods uses maximum a posteriori (MAP) filters. The MAP estimation of *w*, based on the observation *y=w+n*, (where *n* represents the additive noise) is given by the MAP filter equation:

where *p*_{a} denotes the probability density function (pdf) of *a*.

#### 2.3.1. Classical despecklisation filters

Kuan [Kuan et al., 1987] considered a multiplicative speckle model and designed a linear filter based on the minimum mean square error (MMSE) criterion, optimal when both the scene and the detected intensities are Gaussian distributed.

The Lee filter [Lee, 1981] is a particular case of the Kuan filter due to a linear approximation made for the multiplicative noise model.

The Frost filter [Frost et al., 1982] is a Wiener filter adapted to multiplicative noise. The parameters of the Kuan, Lee and Frost filters are: the size of the rectangular windows used for the estimation of the local standard deviation of the useful component of the acquired image and its number of looks.

### 2.4. Denoising in a transform’s domain

Another excellent denoising method for the case of additive noise makes the object of [Katkovnik et al.,2006]. There are a lot of original contributions highlighted in this reference: the local polynomial analysis of images, the role of anisotropic analysis windows, etc… Some very clever associations of those notions conduct to very efficient local denoising methods applied in the spatial domain or in the field of the shape adaptive Digital Cosinus Transform (SADCT).

#### 2.4.1. Denoising into the wavelets domain

A set of denoising methods for additive noise act in the wavelets domain. These methods have three steps: the computation of a wavelet transform (WT); the filtering of the detail wavelet coefficients; the computation of the corresponding inverse WT (IWT). The usefulness of the filtering in the wavelet domain comes from the sparseness of the WTs. There are only few wavelet coefficients with high magnitude, which concentrate the most of the energy of the noiseless component of the input image. The other wavelet coefficients have small magnitude and can be considered noise. They can be discarded without producing high distortions.

A first category of denoising methods applied in the wavelets domain is based on non-parametric techniques [Donoho&Johnstone, 1994] and uses the hard or the soft thresholding filters. The parent of these methods is Professor David Donoho from Stanford University. His goal was to estimate the noiseless component of the acquired image by the minimization of the min-max approximation error. He studied the case of the Discrete Wavelet Transform (DWT). In the following we will make a general presentation of the wavelet transforms and we’ll continue with the denoising techniques based in these transforms.

### 2.4.1.1. Wavelet transforms

There are few wavelet transforms (WT) which were used for images denoising. The first one was the DWT.

*2.4.1.1.1. The DWT*

All the WTs are characterized by two features: the mother wavelets, MW and the primary resolution, PR (number of iterations). The importance of their selection is highlighted in [Nason, 2002]. An appealing particularity of the 2D DWT is the interscale dependency of the wavelet coefficients. The main advantage of the implementation of the 2D DWT is its flexibility, as it inherits some of the classes of mother wavelets developed in the framework of the 1D DWT, like the Daubechies, Symmlet or Coiflet families. The implementation of 1D DWT is presented in figure 1. It implements an orthonormal multiresolution analysis [Mallat, 99]. Each of the iterations of the algorithm used for the computation of the 2D DWT implies several operations. First, the lines of the input image (obtained at the end of the previous iteration) are passed trough two different filters (a lowpass filter having the impulse response *m*_{0} and a high-pass filter *m*_{1}) resulting two different sub-images. Then the lines of the two sub-images obtained at the output of the two filters are decimated with a factor of 2. Next, the columns of the two images obtained are low-pass filtered with *m*_{0} and high-pass filtered with *m*_{1}. The columns of those four sub-images are also decimated with a factor of 2. Four new sub-images, representing the result of the current iteration (which corresponds to the current decomposition level (or scale)), are obtained. These sub-images are called subbands. The first sub-image, obtained after two lowpass filterings, is called approximation sub-image (or LL subband). The other three are named detail sub-images: LH, HL and HH. The LL sub-image represents the input for the next iteration. In the following, the coefficients of the DWT will be denoted with*x* represents the image who’s DWT is computed (considered as a bivariate random signal), *m* represents the current scale and *k* = 1 - for the subband LH, *k* = 2 - for HL, *k* = 3 - for HH and *k* = 4 - for LL. These coefficients are computed using the following relation:

where the wavelets are real functions and can be factorized as:

and the two factors can be computed using the scale function

where:

Taking into account the last equations it can be written:

An iteration of the 2D DWT is presented in figure 2. The main disadvantages of the 2D DWT are the poor directional selectivity and the shift sensitivity. Separable filtering along the rows and columns of an image produces four images at each level. The LH and HL bandpass sub-images can select mainly horizontal or vertical edges respectively, but the HH sub-image contains components from diagonal features of either orientation. This means that the separable real 2D DWT has ‘poor directional selectivity’. This is illustrated in fig. 3. From left to right we can observe the vertical details (HL), the horizontal details (LH) and the diagonal details (HH). Prof. Kingsbury, in [Kingsbury, 1999] explains this limitation by the fact that real highpass row filters select both positive and negative horizontal high frequencies and, consequently, the combined HH filter must have pass-bands in all four quadrants of the 2-D frequency plane.

On the other hand, a directionally selective filter for diagonal features with positive gradient must have pass-bands only in quadrants 2 and 4 of the frequency plane, while a filter for diagonals with negative gradient must have pass-bands only in quadrants 1 and 3. The poor directional properties of real separable filters make it difficult to generate steerable or directionally selective algorithms, based on the separable real DWT. Other WTs which overcomes one or both these disadvantages are: the Undecimated Discrete Wavelet Transform (UDWT), the Double Tree Complex Wavelet Transform (DTCWT) and the Hyperanalytic Wavelet Transform (HWT).

*2.4.1.1.2. The 2D UDWT*

A particular wavelet transform is the undecimated wavelet transform (UWT), also known as the ‘Stationary Wavelet Transform’ and can be implemented using the ‘Algorithme à Trous’. The UDWT refers to the discrete implementation of this transform. The ‘Algorithme à Trous’, first introduced in [Holschneider et al., 1989] for music synthesis applications, is similar to a non orthonormal multiresolution algorithm for which the discrete wavelet transform is exact. An implementation of the UDWT is presented in figure 1. Due to the absence of decimators in the UDWT’s implementation, each coefficient sequence from any level of decomposition has the same length as the original, in other words, if the original signal has *N* samples, the UDWT *L*-level representation is of size *N*(*L*+1), making from the UDWT a highly redundant transform. But eliminating the decimators is obtained a translation invariant WT. The 2D UDWT is shift-invariant (due to the absence of decimators) but it has a poor directional selectivity and is very redundant.

*2.4.1.1.3. The 2D DTCWT*

In 1998, Kingsbury first introduced the DT CWT [Kingsbury, 1998] that relies on the observation that approximate shift invariance can be achieved with a real DWT by doubling the sampling rate at each level of the tree. For this to work, the samples must be evenly spaced. The sampling rates can be doubled by eliminating the down-sampling by 2 after the level 1 filters. This is equivalent to having two parallel fully-decimated trees *a* and *b*, like in fig. 4, provided that the delays of *H*_{0b} and *H*_{1b} are one sample offset from *H*_{0a} and *H*_{1a}. Kingsbury found that, to get uniform intervals between samples from the two trees below level 1, the filters in one tree must provide delays that are half a sample different (at each filter input rate) from those in the other tree. This statement is also supported by Selesnick who, in [Selesnick, 2001], gives an alternative derivation and explanation of the same result. The implementation of such a transform is done using two mother wavelets, one for each tree, one of them being (approximately) the Hilbert transform of the other. On one hand, the dual-tree DWT can be viewed as an overcomplete wavelet transform with a redundancy factor of two. On the other hand, the dual-tree DWT is also a complex DWT, where the first and second DWTs represent the real and imaginary parts of a single complex DWT. In order to have a visual aspect of the DT CWT, we present in figure 4, the Q-shift version of the DT CWT as it is given in [Kingsbury, 2001]. In order to examine the shift invariance properties of a transform, Kingsbury [Kingsbury, 2001] proposes a method based on the retention of just one type (details or approximations), from just one level of the decomposition tree. For example one might choose to retain only the level-3 detail coefficients and set all the others to zero. If the signal reconstructed from just these coefficients is free of aliasing then it can be said that the transform is shift invariant at that level. The degree of shift invariance of two implementation schemes (one for the DT CWT and the other for the classical DWT) is compared in fig. 5. In each case the input is given by 16 shifted versions of a unitary step signal. Each unitary step is passed through the forward and inverse version of the chosen wavelet transform. The figure shows the input steps and the components of the inverse transform output signal, reconstructed from the wavelet coefficients at each of levels 1 to 4 in turn and from the scaling function coefficients at level 4. Summing these components reconstructs the input steps perfectly. Good shift invariance is shown when all the 16 output components from a given level have the same shape, independent of shift. It is easily observed that the DT CWT has outstanding performances in this direction compared to the severe shift dependence of the normal DWT.

Extension of the DT CWT to two dimensions is achieved by separable filtering along columns and then rows. However, if column and row filters both suppress negative frequencies, then only the first quadrant of the 2-D signal spectrum is retained. It is well known, from 2-D Fourier transform theory, that two adjacent quadrants of the spectrum are required to represent fully a real 2-D signal. Therefore in the DT CWT the image is also filtered with complex conjugates of the row (or column) filters in order to retain a second (or fourth) quadrant of the spectrum. This then gives 4:1 redundancy in the transformed 2-D signal. A schematic representation of the 2D DT CWT based on the even-odd implementation was given by [Jalobeanu et al., 2003]. At level *m* = 1, the 2D DT CWT is simply a non-decimated wavelet transform (using a pair of odd-length filters *h*^{o} and *g*^{o}) whose coefficients are re-ordered into 4 interleaved images by using their parity. This defines the 4 trees *T* = *A*, *B*, *C* and *D*. If *a* and *d* denotes approximation and detail coefficients (*a*^{0} = *x*, the input image), we have:

For all other scales (*j* > 1), the transform involves an additional pair of filters, even-length, denoted *h*^{e} and *g*^{e}. There must be a half-sample shift between the trees to achieve the approximate shift invariance. Therefore, different length filters are used for each tree, i.e. it is necessary to combine *h*^{e}, *g*^{e} with *h*^{o}, *g*^{o}, the 4 possible combinations corresponding to the 4 trees:

The trees are processed separately, as in a real transform. The combination of odd and even filters depends on each tree. The transform is achieved by a fast Filter Bank (FB) technique, of complexity *O*(*N*). The reconstruction is done in each tree independently, by using the dual filters. To obtain *a*^{0}, the results of the 4 trees are averaged. This ensures the symmetry between them, thus enabling the desired shift invariance. The complex coefficients are obtained by combining the different trees together. If the subbands are indexed by *k*, the detail subbands *d*^{j,k} of the parallel trees *A*, *B*, *C* and *D* are combined to form complex subbands

Complex filters in multiple dimensions can provide true directional selectivity, despite being implemented separately, because they are still able to separate all parts of the m-D frequency space. For example the 2D DT CWT produces six bandpass sub-images of complex coefficients at each level, which are strongly oriented at angles ±15º, ±45º, ±75º, as illustrated by the level 4 impulse responses in fig. 3.

In order to obtain these directional responses, it is necessary to interpret the scaling function (lowpass) coefficients from the two trees as complex pairs (rather than as purely real coefficients at double rate) so that they can be correctly combined with wavelet (highpass) coefficients, which are also complex, to obtain the filters oriented at ±15º and ±75º (see 5). The main property of the 2D DT CWT is the quasi shift invariance, as shown by Kingsbury in [Kingsbury, 2001] i.e. the magnitudes

*2.4.1.1.4. The HWT*

In [Abry, 1994] is demonstrated that approximate shiftability is possible for the DWT with a small, fixed amount of transform redundancy. In this reference is designed a pair of real mother wavelets such that one is approximately the Hilbert transform of the other. This wavelet pair defines an analytic discrete wavelet transform (ADWT) presented in figure 7 a). A complex wavelet coefficient is obtained by interpreting the wavelet coefficient from one DWT tree as being its real part, whereas the corresponding coefficient from the other tree is interpreted as its imaginary part. The implementation of the ADWT is presented in figure 7 b). We first apply a Hilbert transform to the data. The real wavelet transform is then applied to the analytical signal associated to the input data, obtaining complex coefficients. The two implementations of the ADWT presented in figure 7 are equivalent because:

(6) |

In fact neither the DTCWT nor the proposed implementation of ADWT correspond to perfect analytic mother wavelets, because the exact digital implementation of a Hilbert transform pair of mother wavelets with good performance is not possible in the case of the first transform (due to the fractional delay between the two trees required) and because the digital Hilbert transformer is not a realizable system in the case of the second transform. The DTCWT requires special mother wavelets (the implementation of the ADWT proposed in figure 7 b) can be realized using classical mother wavelets like those conceived by Daubechies) but can assure a higher degree of shift invariance. These two transforms have in the 1D case a redundancy of 2. In order to evaluate the shift-invariance performance of ADWT we make a visual evaluation of the degree of shift invariance by taking the same test professor Kingsbury has proposed in [Kingsbury, 2001]. We have used this test to make a comparison between ADWT, DT CWT, and the classical DWT. As can be seen, we obtain, using Daubechies 10-taps filter, results comparable with those obtained by Kingsbury. From fig. 5 it can be observed that the DWT is not shift-invariant; the lines of coefficients corresponding to different shifts are not parallel, while the ADWT and DT CWT are quasi shift-invariant.

All the 1D WTs already mentioned have simpler or more complicated 2D generalizations. The generalization of the analyticity concept in 2D is not obvious, because there are multiple definitions of the Hilbert transform in this case. In the following we will use the definition of the analytic signal associated to a 2D real signal named hypercomplex signal. So, the hypercomplex mother wavelets associated to the real mother wavelets

where

(8) |

So, the HWT of the image

The enhancement of the directional selectivity of the HWT is realized like in the case of the 2D DTCWT, by the separation of the real and imaginary parts of the complex coefficients belonging to each subband of the WT (eq. 5). The HWT implementation is presented in figure 8.

Let us consider, for example, the case of the diagonal detail subbands, (HH), presented in figure 9. We selected a particular input image, *f* with the outputs

In conclusion, by using the HWT these directions can be separated. The same strategy can be used to enhance the directional selectivity in the other two subbands: LH and HL, obtaining the preferential orientations:

The two complex WTs outperform the 2D DWT. The behavior of the HWT is quite similar with the comportment of the 2D DTCWT. One of the goals of the present chapter is to compare the efficiency of those WTs in despecklisation. A second goal of this chapter is to compare the efficiency of the filters applied in the wavelet domain for SONAR despecklisation.

### 2.4.1.2. MAP Filters in the wavelets domain

As was already said, after the transformation of the multiplicative noise into an additive one based on equations (2) or (3), the WT of the signal obtained is computed. Tacking into account the linearity of the WT, it can be written the equation *y=w+n*, where the first term in the right hand member represents the WT of the noiseless component and the second term represents the WT of the noise. Then the MAP filter equation (4) must be solved. Its solution depends on the selection of the models of pdfs for the noiseless component and for the noise. There are two categories of MAP filters: marginal (both pdfs are univariate functions) and joint (both pdfs are multi-variate functions).

The simpler MAP filter is the marginal zero order Wiener filter. It is constructed considering that the noiseless component of the acquired image and the noise are Gaussian distributed.

*2.4.1.2.1. The zero order Wiener filter*

We will derive first the joint zero order Wiener filter considering that both pdfs are *d*-dimensional zero mean Gaussians:

where the covariance matrix of vector **a** was denoted by **C _{ a}** and its determinant by

*d*=1. Then, the MAP filter equation (4) becomes:

or:

problem which is equivalent with the equation obtained by putting the derivative of the argument of the right hand side member equal with zero:

Considering the following forms of the *d*-dimensional vectors:

where the elements are zero mean independent random variables *a*_{k}, *k*=1,2,…,*d*, with variances

where **I** represents the *d*×*d* unitary matrix. Its determinant is expressed by:

So, the multi-variate pdf of the Gaussian random process from (9), for independent univariate components can be put in the equivalent form:

If all the independent random variables *a*_{k}, *k*=1,2,…,*d,* have the same variances

Substituting (13) into equation (11), the following system of equations is obtained.

The solution can be also written in matrix form:

This is the expression of the **joint zero order Wiener estimator.** For *d=*1, the last equation can be written in the particular form:

This is the input-output relation of the marginal zero order Wiener filter. Unfortunately, the two variances in the last equation are not known a priori. The noise variance can be estimated globally, using the diagonal detail sub image obtained at the first decomposition level,

Generally, the variance *P*+1) (2*P*+1) denoted by

and next its local variance :

Using these values, the local variance of the noiseless component is computed by:

The zero order Wiener filter can be applied in association with the 2D DWT or 2D DT CWT or HWT. Its main disadvantage is the fact that the model considered for the noiseless image is not adequate because the wavelet coefficients have a heavy tail distribution (due to the sparseness of the WTs). Due to the sparseness of the WTs of the noiseless component, its distribution is heavy tailed. Indeed, there are only few wavelet coefficients with high magnitude; the majority of the wavelet coefficients have small magnitudes producing the heavy tails of the distribution. This affirmation is verified with the aid of the following experiment.

The histograms of different subbands obtained applying the HWT to the image Lena are computed. The results are represented in figure 11. The linear dependencies of the two branches of the logarithms of the histograms prove that the pdfs of the real and imaginary parts of the HWT coefficients correspond to exponential laws (which are heavy tailed):

where *K*_{1} and *K*_{2} represent two constants. The Gaussian model supposed for the construction of the marginal zero order Wiener filter does not correspond to the linear dependencies of the two branches of the histograms:

So, the hypothesis that the real and imaginary parts of the useful HWT coefficients are distributed following Laplace (exponential) laws can be made.

Taking into consideration the histograms of the noiseless wavelet coefficients it can be observed that their repartition is heavy-tailed. The majority of the wavelet coefficients have small magnitude (close to zero). So, the repartition of the wavelet coefficients can be considered of Laplace type.

On the basis of this hypothesis, another MAP filter named adaptive soft thresholding can be constructed.

### 2.4.1.2.2. The adaptive soft-thresholding filter

If *n* is Gaussian distributed and *w* has a Laplacian distribution then the MAP filter becomes an adaptive soft thresholding filter (STF). The hypotheses for this type of marginal MAP filter are:

The MAP filter equation (4) becomes:

To maximize the argument of the right hand side, the following equation must be solved:

For w>0 it becomes:

For w<0 the equation (24) becomes:

The condition w>0 implies the condition

equivalent with the final form:

where:

This MAP filter is a STF with the threshold’s value equal to

### 2.4.1.2.3. Inter-scale dependence of wavelet coefficients

The wavelet coefficients are characterized by interscale dependence.

The cross-correlation of two wavelet coefficients belonging to the 2D DWT of an image f, located in the same subband *k* at the scales *m*_{1} and *m*_{2}= *m*_{1}+*q* and having the geometrical coordinates (_{2}, p_{2}) respectively is given by the following relation:

which represents an inter-scale and intra-band dependency. If the mother wavelet **orthogonal** basis of

and the expression of the inter-scale and intra-band dependency of the coefficients cross-correlation becomes:

This cross-correlation is function of the autocorrelation of the input signal. If we refer to pairs of wavelet coefficients located at two consecutive decomposition levels (q=1, m_{2}= m_{1}+1) at the same coordinates (

which means that the child and the parent coefficient are correlated. So, if the magnitude of the parent coefficient is large (small) then the magnitude of the child coefficient is also large (small). Taking into consideration that the computation of the HWT reduces at the computation of the 2D DWTs of four related images

The MAP filters which take into account this dependence are multivariate and have better performance. Some bivariate probability density functions are constructed starting from the Gaussian scale mixture (GSM) statistical model [Selesnick, 2008].

### 2.4.1.2.4. The GSM model

The construction of the bivariate pdf can be done with the aid of Gaussian Scale Mixtures (GSM). This simple statistical model has been used to model natural signals such as speech and more recently the wavelet coefficients of natural images. The model is given in equation (35). It assumes that each vector of coefficients **w** is specified by a stationary bivariate zero mean Gaussian process **x** and a spatially fluctuating variance z.

The multiplier z is usually a function of the surrounding coefficient values (like the local variance of the coefficients within the same scale or a more complex function of the neighboring coefficients within the same and adjacent scales). The result is always leptokurotic (kurtosis ≥ 3), its distribution having long tails. The MMSE estimate with such priors takes the form of a locally adaptive Wiener-like estimator. Usually the number of elements of the vectors **x** and **w** is d=2. To model the self-reinforcing property of the coefficients, z must be slowly varying but it does not need to be symmetric in all directions. It has been shown that for slowly varying z this model can successfully simulate the high kurtosis and longer tails of the marginal distributions. The stationary portion of the model **x** is Gaussian distributed over a small neighborhood of wavelet coefficients. It is generally assumed that z varies slowly enough to be considered constant over that neighborhood of coefficients. Under this assumption the model is now a particular form of a spherically invariant random process called a GSM. For a small neighborhood of coefficients at nearby spatial locations and scale, a GSM vector **w** is the product of two independent random variables: a positive scalar z referred to as the hidden multiplier or mixing variable and a Gaussian random vector **x**. The pdf of the Gaussian vector **x** is given by:

(see (12)). Setting**w** is given by:

Tacking into account the relation of *a* and *z*, the pdf of *a* can be expressed on the basis of the pdf of *z*:

Substituting (38) into (37), the expression of the pdf of **w** becomes:

It remains to specify the prior probability function p_{z}(z) for the multiplier z. Some propositions of prior probability functions are made in [Selesnick, 2008]. One of these propositions is the Gamma law:

Substituting (40) into (39), the expression of the pdf of **w** becomes:

Finally:

This is a d-dimensional spherically-contoured multivariate pdf [Selesnick, 2008]. For d=2, this pdf is bivariate:

With the aid of this bivariate distribution the bishrink MAP filter [Sendur&Selesnick, 2002] can be constructed. This construction will be explained in the following section. For d=1, the pdf in (43) is univariate:

This univariate pdf is of the form of the Laplace law; see the hypotheses of the adaptive STF, (22). In consequence the GSM hypothesis, even for the case of a unique scale, is useful for modeling the repartition of the wavelet coefficients. Other proposition of prior probability function made in [Selesnick, 2008] is the exponential

It results a third spherically-contoured bivariate pdf. A last bivariate pdf for the wavelet coefficients was proposed in [Achim&Kuruoglu, 2005]:

Using one of these models, in [Selesnick, 2008] is conceived a bivariate MAP filter called bishrink.

### 2.4.1.2.5. The bishrink filter

The noise is assumed i.i.d. Gaussian,

The model of the noise-free image is given in (43):

a heavy tailed distribution. Substituting these two pdfs in the equation of the MAP filter (4) this equation becomes:

The system of equations that gives the maximum of the right hand side of the last equation is:

(50) |

or:

The system can be put in the following form:

or equivalently:

Computing the square of each equation it results:

By adding the two equations it can be obtained:

Substituting this result in the two equations of the system in (52) it can be written:

So the input-output relation of the bishrink filter is:

It can be observed that the bishrink filter is an estimator of the adaptive STF type. In this case the threshold’s value is

The input-output relation of the bishrink filter (55) can be put in the following form:

So, it can be written:

The absolute value of this sensitivity is inversely proportional to

Another very important parameter of the bishrink filter is the local estimation of the marginal variance of the noise-free image

This is a decreasing function of

Secondly, the local variance of a pixel gives some information about the frequency content of the region to which the considered pixel belongs. If the pixels of a given region have low local variances then the considered region contains low frequencies. If these pixels have high local variances then the considered region contains high frequencies.

The bishrink is a local bivariate MAP filter. Its performance depends on the quality of the estimation of a parameter, the local variance of the noiseless component of the acquired image. The quality of this estimate depends on the shape and the size of the estimation window. These estimation windows have different shapes in subbands with different preferential orientations highlighting the better directional selectivity of DTCWT and HWT versus the DWT.

Other MAP filters can also be considered. A MAP filter constructed on the basis of a bivariate Cauchy model is described in [Achim&Kuruoglu, 2005]. The Pearson statistical model is exploited for the construction of the marginal MAP filter presented in [Foucher et al., 2001]. The model of generalized Gaussian (GG) is used in [Argenti et al., 2006].

### 2.5. Despecklisation systems

We will compare in the following the classical despecklisation systems with some modern denoising systems conceived for the reduction of speckle noise in SONAR images based on complex WTs, namely the 2D DT CWT and the HWT.

The modern systems have the architecture presented in figure 12. The block named Sensitivity reduction corrects the drawbacks of the additive noise denoising kernel. One of the disadvantages of homomorphic denoising methods is the introduction of an undesirable bias. The expectations of the result and of the noiseless component of the acquired image are different. Their difference is given by the bias already mentioned. The proposed despecklisation system corrects this bias with the aid of two mean computation systems. The correction is based on the property of the speckle noise to have unitary expectation. The first mean computation system in figure 1 estimates the mean of the acquired image which is equal with the mean of its noise-free component (because the speckle has unitary mean). The second mean computation system in figure 12 computes the mean of the image at the output of the Sensitivity reduction block. This value is extracted and the mean of the noiseless component of the acquired image is added.

Both additive noise denoising kernels which exploit the 2D DT CWT and the HWT use the bishrink filter associated with the corresponding complex WT. The architecture of the corresponding Sensitivity reduction blocks is different for the two complex WTs.

The bishrink filter can be used for despecklisation in the wavelets domain because its statistical model for the noise is appropriate as can be seen from the following experiment. The speckle can be modeled using a Rayleigh distribution with unitary mean. It is obtained computing the square root of a sum of squares of two white Gaussian noises having the same variance. The first image of figure 13 contains a normalized representation of the pdf in equation (43) particularized for a given variance of the noise. The second image of figure 13 represents the normalized bivariate histogram of the HWT coefficients **of the logarithm of the noise.** It was obtained considering the HWT coefficients

### 2.6. Simulations results

We present two types of simulation results: for synthesized speckle noise and for real SONAR images. The performance of the simulation results reported in this chapter is appreciated on the basis of some quality measures. The first one is the peak signal to noise ratio (PSNR) which is defined in the following. Let s and

where N_{p} is the number of pixels. The PSNR in dB is given by:

Because its computation requires the knowledge of s the PSNR is computed in the experiments associated with synthesized speckle noise. Another quality measure is represented by the method’s noise. For multiplicative noise (as the speckle is) the method’s noise is defined by the ratio between the acquired image and the denoising result. This noise must be as similar as possible with the noise which perturbed the noiseless component of the acquired image. As the PSNR, the method noise can be appreciated only in the experiments associated with synthesized speckle. A third measure of quality of the denoising of a homogeneous region is given by the enhancement of the equivalent number of looks (ENL):

The enhancement of ENL is computed as the ratio of the output and input ENLs of the same region. This is a quality measure which can be appreciated in the case of real SONAR images as well. We have used two types of synthesized speckle. The first type is synthesized using a Rayleigh distribution with unitary mean and it is associated with the image Lena. The second type is synthesized using the method proposed in [Walessa&Datcu, 2000] and is associated with the test image presented in figure 14 [Walessa&Datcu, 2000].

To run these simulations on real SONAR images we have used a database obtained from IFREMER Brest France. We are thankful for this opportunity. Besides these three objective quality measures, we considered the visual aspect of the results as well.

#### 2.6.1. Synthesized images

We have applied first the classical despecklisation methods to the test image in figure 14 obtaining the results presented in figure 15. The corresponding filters are indexed with two indices, the first one specifying the size of the analysis window (for example 7 for a rectangular window with size 77) and the second one the number of looks of the image treated (1 in all the three cases).

The PSNRs obtained separating the sub-image Lena from the test image and from the results in figure 15 are indicated in Table I and compared with the results obtained by applying the denoising methods which associates the bishrink filter with the 2D DT CWT (fifth column) and with the HWT (sixth column) respectively.

The superiority of the denoising methods acting in the wavelet domain is obvious. The comparison in figure 16 permits the appreciation of the visual aspect of the results of the methods based on 2D DT CWT and HWT for the treatment of the test image in figure 14. These results are compared in figure 16 with the result of another despecklisation method applied in the spatial domain [Walessa&Datcu, 2000]. The superiority of the directional selectivity of the methods based on complex wavelets can be observed comparing the interior of the regions highlighted in yellow. Indeed the methods based on wavelets conserve better the details of the hat. The result with the best visual aspect in figure 16 corresponds to the utilization of the HWT. In figure 17 are compared the results of three different methods based on wavelets. The first two despecklisation methods are those based on 2D DT CWT and HWT and the third one was proposed in [Argenti et al., 2006] and is based on the association of the 2D UDWT with a MAP filter. The 2D UDWT is computed either with the aid of the Daubechies mother wavelets with eight vanishing moments, db8 or with the pair of biorthogonal mother wavelets bior9.7. The first denoising algorithm proposed in [Argenti et al., 2006] performs a local linear minimum mean square error (LLMMSE) filtering in the UDWT domain. The second one uses a MAP filter constructed supposing that the noise-free wavelet coefficients and the wavelet coefficients of the noise are distributed according to Generalized Gaussian Distributions.

The parameters of those pdfs are estimated for each pixel of the input image. The corresponding MAP filter equation is solved with the aid of numerical methods. All the three results in figure 17 have a good visual aspect. The methods based on complex wavelets (2D DT CWT and HWT) prove a better output PSNR, a better directional selectivity, a better treatment of the contours and textures and a better contrast preservation but the method based on 2D UDWT treats better the homogeneous regions. The method’s noise corresponding to the use of 2D DT CWT is quite identical with the original noise (compare images (b) and (d)) some differences appearing in the dark regions of the image Lena.

#### 2.6.2. Real images

In figure 18 is made a comparison of the despecklisation methods based on 2D DT CWT and HWT applied to a very difficult SONAR image. The first picture in figure 18 represents the acquired SONAR image. It is very difficult because it has small size (200320) and because the relief of the sea floor is flat in the region considered. So, all the pixels have intensities of similar values. The image was artificially colored, to see better the small intensity differences. Even the specular region can be observed with difficulty in this picture. This image is very noisy. It can be observed, analyzing the results of denoising, a demarcation line between two regions (the blue one and the green one) with pixels having different intensities. The difference between the intensities of the pixels belonging to the two regions is smaller than 1 dB. So, the proposed denoising method increases the discrimination capability of the analyst. Another consequence of the denoising based on wavelets is the better separation of the specular region from the rest of the scene. It can be also observed that a great amount of noise was eliminated by the proposed denoising methods. Comparing the second and the third pictures in figure 18, it can be noticed that the result of the despecklisation method based on the 2D DT CWT is over smoothed. So, the despecklisation method based on HWT seems to be the best choice. The comparison of the despecklisation methods based on 2D DT CWT and HWT is continued in figure 19, where other two SONAR images are considered. On the first column are represented the row SONAR images, on the second column are represented the results obtained applying the despecklisation method based on 2D DT CWT and on the last column are represented the results obtained applying the despecklisation method based on HWT. Analyzing figure 19, it can be observed that the two despecklisation methods based on wavelets are equivalent from the visual aspect point of view. Finally, we compare in figure 20 the two denoising methods based on 2D DT CWT and HWT respectively using a SONAR image containing a higher amount of information. The performance obtained by the two despecklisation methods based on wavelets for homogeneous regions is certificated by the important enhancements of ENL obtained considering a region of 1201000 pixels. The gain in performance of the method based on HWT can be explained by the superiority of HWT versus 2D DT CWT in despecklization applications.

## 3. Conclusion

This book is dedicated to SONAR systems. Despite the actual proliferation of this type of images, there are not numerous publications dealing with their denoising. In this chapter was presented the particular case of SONAR images starting with an overview of speckle removal techniques both in the spatial domain and in the wavelet domain. We have proved by simulations the superiority of the methods applied in the wavelet domain. These methods were classified according to the type of the wavelet transform used: 2D DWT, 2D

UDWT, 2D DT CWT and HWT. In the case of the 2D DT CWT and HWT, we have used homomorphic filtering techniques and applied the bishrink filter in the ‘additive noise denoising kernel’ block, in figure 12. The proposed algorithms use two of the best WTs, the 2D DT CWT and the HWT and a very good MAP filter which can be associated for despecklisation purposes. The undesirable bias introduced by the homomorphic methods is corrected using a clever combination of two expectation computation blocks.

The results obtained were satisfactory, especially those in which HWT with Biorthogonal 9/7 as mother wavelet was associated with the bishrink filter, followed by a supplementary correction applied to homogeneous areas. We have proved by simulation that the HWT is a better choice than the DWT, the UWT or the DT CWT for SONAR images despecklisation. The denoising method based on HWT outperforms other remote sensing images denoising methods from the visual aspect (it preserves better the contrast of the noiseless component of the image which must be denoised), the PSNR enhancement and the enhancement of ENL points of view. It is faster than the other despecklisation methods as well.

## Acknowledgements

The research reported in this chapter was developed in the framework of a grant funded by the Romanian Research Council (CNCSIS) with the title “Using wavelets theory for decision making” no. 349/13.01.09. We have established collaboration with the specialists from the French Sea Institute, IFREMER, from Brest; Xavier Lurton and Jean-Marie Augustin, with respect to the denoising of SONAR images. We also acknowledge the contribution of Professor Jean-Marc Boucher, the Ph. D. Advisor of Ioana Firoiu.