## 1. Introduction

### 1.1. Introduction to the à trous wavelet

In 1992, Mallat and Zhong designed a fast algorithm for the orthogonal wavelet transform (OWT) of a discrete signal *f*_{0}(*x*) having finite energy by level filtering with a brace of low-filter *h*(*n*) and high-pass filter *g*(*n*)*.* For the original image *A*_{0}, the OWT can be achieved as:

The reconstruction can be achieved by the inverse OWT (IOWT) as:

In (1) and (2), *r*=1, 2,…, *N* denotes the decomposition levels, *h*(*n*) and *g*(*n*). *A*_{r} denotes the low frequency component of *A*_{0} in horizontal and vertical direction. Similarly, *r*. High frequency component represent the detail and edge information while the low frequency component represent the coarse information.

As a simple example, the brace of low-filter *h*(*n*) and high-pass filter *g*(*n*) is given by *h*(*n*)=[0.7071, 0.7071], *g*(*n*)=[-0.7071, 0.7071].

The OWT is a popular method used for fusing multisensor images. The OWT decomposes an image with a wavelet basis according to pyramid scheme. The resolution is reduced by one-half at each level by subsampling data by two. One low frequency component, horizontal, vertical and diagonal detail components are produced at each level. The complete decomposition produces the same number of pixels as the original image.

The OWT can be used to improve the quality of the fused image. However, some limitations exist: 1) the OWT is applied to discrete images with sizes that are powers of two, because the resolution is reduced by two at each level. In this sense, it is not possible to fuse images of any sizes; 2) the analysis pixel by pixel is not possible since data are reduced at each resolution, it cannot follow to distinguish the evolution of a dominant feature through levels. 3) no satisfactory rule allowing a good quality of the fusion with the OWT exists (Chibani and Houacine, 2003).

For the OWT, the down-sampled multiresolution analysis does not preserve the translation invariance, *i.e.* a translation of the original signal does not necessarily imply a translation of the corresponding wavelet coefficients. Therefore, wavelet coefficients generated by an image discontinuity could disappear arbitrarily. This nonstationarity in the representation is a direct consequence of the downsampling operation. In order to preserve this property, stationary wavelet transform was introduced (Garzelli 2002). The redundant wavelet transform (RWT) overcomes the limits of the OWT, and allows a great flexibility in defining fusion rules. The RWT can be finished by using *à* *trous* (holes) algorithm as:

The original signal can be reconstructed by adding the set of wavelet coefficients for all scales with the last approximation scale *f*_{J}(*x*) as

The RWT of an image is accomplished by a separated fltering following rows and columns, respectively. Specifically, a single wavelet plane is produced at each scale by subtraction of two successive approximations without decimation. Hence, wavelet and approximation planes have the same dimensions as the original image.

A scaling function which has a B_{3} cubic spline profile, and its use leads to a convolution with a mask of 5×5:

The RWT method is based on the fact that, in the RWT decomposition, the images are the successive versions of the original image at increasing scales. Thus, the first RWT planes of the high-resolution panchromatic image have spatial information that is not present in the multispectral image. The RWT based image fusion can be carried out using substitution method and additive method. In the wavelet substitution method, some of the RWT planes of the multispectral image are substituted by the RWT planes corresponding to the panchromatic image. In the additive method, the RWT planes of the panchromatic image are added to the multispectral image or to the intensity component of the multispectral images.

In the substitution method, the RWT planes of the multispectral image are discarded and substituted by the corresponding planes of the panchromatic image. While, in the additive method all the spatial information in the multispectral image is preserved, and the detail information from both sensors is used. The main difference between adding the panchromatic RWT planes to the multispectral images and to the intensity component is that in the first case, high frequency information is added to each multispectral image, while in the latter high frequency information modifies only the intensity. Thus, from the theoretical point of view, adding to the intensity component is a better choice than adding to each multispectral image (Núñez et al., 1999).

## 2. Multivalued wavelet transform

### 2.1. Feature space

Remote sensing image is the carrier of information by sampling the real valued function of space-time about the observed earth's surface. The digital number values of a remote sensing image have multivarious meanings, which include fractal geometry (Liu and Li 1997), raggedness of ground surface (Liu 2000), inner specialties (Eskicioglu and Fisher 1995), definition and contrast (Lu and Healy, Jr. 1994), and edge and boundary-dependent shape segmentation (Nikolov et al. 2000). They are displayed by the grey-values, the abstracted spectral reflectance, statistical elements, *e.g.*, mean and variance, the mutual relationship between neighborhood pixels, and grey-values of the same object, respectively. In the following text, these statistical attributes of the original image (*I*) are dissected into seven representative features with pseudo-formulae.

1. Setover: Setover (*S*) is an important connection between the specific observation of grey-value fluctuation and the usual intensity stability. It balances the total oscillation around the center by the absolute bias between each grey-value and the mean *μ*_{I}. Simultaneously it improves the confidence and sensitivity to locate abnormity by removing *μ*_{I}.

2. Visibility: Visibility (*V*) is defined inspired from the human visual system (Li et al. 2002) with *μ*_{I} and the standard deviation *σ*_{I}. Its each element is the contributive rate scaling local variety. It is equivalent to the deep projection of the corresponding setover onto *σ*_{I}.

3. Flat: The grey-values of a remote sensing image indirectly memorize the reflectance of the scanned groundcover by surveying device. In order to eliminate the possible influence of sunshine, namely the average intensity, flat (*F*) is defined according as each grey-value is divided by *μ*_{I}.

4. Gradient: Gradient (*G*) is pictured by the spatial frequency (Eskicioglu and Fisher 1995) following from the fact that the relationship between contiguous grey-values usually implies change. It is the manner that grey-values switch to their neighbors and weighs the overall activity level of image.

*m* and *n* denote the row and column of the image *I*.

5. Contrast: Contrast (*C*) is another ratio of the difference between the grey-values of the current pixel and the background to *μ*_{I} for magnifying the maximum likelihood of variation-dependent identification. Between the contrast and the visibility of an image, a high correlation exists (Li et al. 2002).

6. Definition: In order to find out where is how change, definition (*D*) is defined with the minimum *m*_{i} of all grey-values, the current grey-value, and the total deflection *δ*_{I}. Definition predicates that the more abrupt the change is, the clearer the feature of the image becomes.

7. Curvature: Curvature (*U*) is a ruler of the deflection extent, and it is rewarded by increasing the accuracy of smoothness or roughness recognition; on the other hand, it is an indicator of salient information that will actually guide the variation finder (Chakraborty et al. 1995).

Apparently, all features are cognate with each other, in other words, when one is high or goes down, so the others appear. Subsequently, a feature vector formed orderly from above seven features can be considered as a paradigm in a mathematical structure called feature space. It is evident that this representation space is beneficial to image processing and analysis technologies at heightening the precision of significance verdict in manner of replacing the original image with the feature vector as follows:

### 2.2. Multivalued wavelet transform

The multivalued wavelet transform (MWT) employed can be performed by applying the RWT to each feature of *I*_{0} as

The original feature vector *I*_{0} can be rebuilt perfectly as

For fusing one multispectral (*T*) image and one panchromatic (*P*) image, the *T* image is first resampled to the pixel size of the *P* image. This fuser that produces the fused image (*F*) is summarized as follows:

where *E*(*x*, *y*) denotes the value of the electing map at position (*x*, *y*), and *j* is the decomposition level.

## 3. Example

### 3.1. Fusing QuickBird images using à trous wavelet

The raw images are downloaded from http://studio.gge.unb.ca/UNB/images. These images are acquired by a commercial satellite, QuickBird, which collects one 0.7 m resolution panchromatic band (450-900 nm) and blue (450-520 nm), green (520-600 nm), red (630-690 nm), near infrared (760-900 nm) bands of 2.8 m resolution. The QuickBird data set was taken over the Pyramid area of Egypt in 2002. The test images of size 1024 by 1024 at the resolution of 0.7 m are cut from the raw images and used as HRPI and LRMIs. Fig. 1(a) displays the LRMIs as a color composite where the red, green, blue bands are mapped into the RGB color space. The HRPI is shown in Fig. 1(b). The near infrared band is not shown because of the limited space in this paper, although the images were processed and numerically evaluated. The study area is composed of various features such as roads, buildings, trees, etc., ranging in size from less than 5 m up to 50 m. It is obvious that the HRPI has better spatial resolution than the LRMIs and more details can be found from the HRPI. Before the image fusion, the raw LRMIs were resampled to the same pixel size of the HRPI in order to perform image registration.

The resolution ratio between the QuickBird HRPI and the LRMIs is 1: 4. Therefore, when performing the à trous based fusion algorithm, à trous filter 2^{-1/2}(1/16, 1/4, 3/8, 1/4, 1/16), together with a decomposition level of two, is employed to abstract the high frequency information of the HRPI. Fused images are shown in Fig. 1(c).

Visual inspection provides a comprehensive impression of image clarity and the similarity of the original and fused images (Wang et al., 2005). By visually comparing all the HRMIs (Fig. 1(c)) with the LRMI (Fig. 1(a)), it is apparent that the spatial resolutions of the HRMIs are much higher than that of the LRMI. Some small spatial structure details, such as edges, lines, which are not discernible in the LRMI, can be identified individually in each of the HRMIs. Buildings corners, holes, and textures are much sharper in Fig. 1(c) than in Fig. 1(a) and can be seen as clear as in Fig. 1(b). This means that the fusion method can improve the spatial quality of the LRMI during the fusion process.

### 3.2. Fusing TM and SPOT images using multivalued wavelet transform

In this section, three TM images (TM3=Red, TM4=near Infrared, TM5=Infrared) with 171×171 pixels and one SPOT image with 5120×5120 m^{2} are fused using the MWT. For the MWT fuser, the three TM images are interpolated to 10 m pixel size in advance, the SPOT image, the TM image and their feature sequences are decomposed with RWT into three levels, and then the voting and electing fuser is fulfilled from the first to the third level. Figure 2(a), 2(b), and 2(c) exhibit the original TM image as a colour composite where TM3, TM4 and TM5 are coded in blue, green and red, the SPOT image, the fused image, respectively.

Compared visually with the original TM image, the spatial discernment of the fused images for the pair of fusers is undoubtedly better. Some small features, such as edges and lines, which are not interpretable in the original TM image can be identified individually in the fused images. Other large features, such as lakes, rivers and blocks, are much sharper than those in the original TM image. These signify that the fuser can assimilate spatial information from the SPOT image. Figure 2(c) shows less retained colours than figure 2(d), and recovery of the original colours is necessary for correct thematic mapping (Chibani and Houacine 2002). For instance, in figure 2(c), all of the green colours shown in the lower left part of figure 2(a) disappear. Second, with regard to clarity, a field of ‘spider-web’ shape in the left-of-centre part of figure 2(c) displays a ‘salt-and-granule’ face.