Shortcomings of existing pan-sharpening methods.
Over the years, researchers have formulated various techniques for pan sharpening that attempt to minimize the spectral distortion, i.e., retain the maximum spectral fidelity of the MS images. On the other hand, if the use of the PAN-sharpened image is just to produce maps for better visual interpretation, then the spectral distortion is not of much concern, as the goal is to produce images with high contrast. To solve the color distortion problem, methods based on spatial frequency domain have been introduced and have demonstrated superior performance in terms of producing high spectral fidelity pan-sharpened images over spatial-scale methods.
- pan sharpening
- spatial scale
- spatial frequency analysis
- discrete wavelet transform
- non–subsampled contourlet transform
- pseudo-Wigner distribution
- urban planning
Earth resource satellites provide data covering different parts of the electromagnetic spectrum at different spatial, spectral, and temporal resolutions. To utilize these different types of image data effectively, a number of pan-sharpening techniques have been developed .
Further, in order to benchmark different image fusion techniques, image quality metrics have been used. There are two types of metrics used to evaluate image quality, namely, subjective (qualitative) and objective (quantitative). The objective of this chapter is to discuss the methodology of some of the prevalent existing techniques, as well as the mathematical representation of some of the standard existing evaluation indicators.
2. Pan-sharpening techniques
Pan sharpening is also known as image fusion, image integration, and multisensor data fusion. Over the years, a large number of pan-sharpening techniques have been developed and have placed into different categorizes. In this study, multiscale transform (MST)-based techniques have been discussed.
2.1 Multiscale transform-based pan-sharpening techniques
In recent years, multiscale transform (MST)-based pan-sharpening techniques have received a lot of attention, since they preserve the spectral fidelity in the pan-sharpened images. Further, it is more suitable for information representation, interpretation, and analysis [2, 3].
Many variations of the multiscale transform-based techniques exist, such as discrete wavelet transform (DWT), stationary wavelet transform (SWT), curvelet transform (CVT), contourlet transform (CT), and Non–subsampled contourlet transform (NSCT) . The next subsections give a descriptive overview and methodology of MST-based pan-sharpening techniques which are selected for this study.
2.1.1 Discrete wavelet transform (DWT)
Before discussing about discrete wavelet transform, first of all, it would be appropriate to discuss in general regarding Fourier transform (FT).
Fourier transform (FT) was first invented by French mathematician and physicist Jean Baptiste Joseph Fourier in 1822. Fourier stated that any periodical function can be represented as a sum of sine and cosine of different frequencies, each multiplied by a different coefficient [5, 6]. Fourier transform converts a signal from the time-amplitude domain to the frequency-amplitude domain. Images are considered as 2-D discrete functions. To use Fourier transform to analyze images, discrete Fourier transform (DFT) is used. FT is a reversible transform, which means the original signal can be recovered through the inverse discrete Fourier transform (IDFT) [7, 8].
However, FT has a drawback, i.e., it does not provide the information about the time at which the particular frequency exists in the signal. Fourier transform only captures the different frequencies in a signal and cannot detect when those frequencies occurred. To overcome this drawback, wavelet transform (WT) was introduced. Wavelet transform (WT) can be more useful than Fourier transform, since it is based on functions that are localized in both space and frequency/scale . Wavelet transform brings a multiresolution framework. With this setting, the signal can be decomposed into components that collect the information at a specified scale, i.e., different frequencies are analyzed with different resolutions [2, 3, 4, 5, 6]. The WT has numerous applications in remote sensing such as image registration, spatial and spectral fusion, feature extraction, speckle reduction, texture classification, and crop phenology detection .
Wavelet transform can be broadly classified into two main groups, i.e., continuous wavelet transform (CWT) and discrete wavelet transform (DWT). Since CWT is continuous, as a result, there are an infinite number of scale and translation parameters which leads to an infinite number of possible wavelet functions. To overcome the shortcoming of CWT, DWT was introduced.
In the DWT algorithm, an image can be analyzed by passing it through an analysis filter bank followed by decimation operation. The analysis filter bank consists of low pass and high pass filter at each decomposition stage. When a signal passes through these filters, it splits in to two signals. The low pass filter, which corresponds to an averaging operation, extracts the coarse (average) information of the signal. The high pass filter, which corresponds to a differencing operation, extracts the detail information of the signal such as edges, points, and lines. The output of the filtering operation is then decimated by two, i.e., a 2-D transform is accomplished by performing two separate one-dimensional transform [9, 10, 11, 12]. First of all, the image is filtered along the row and decimated by two, and it is then followed by filtering the subimage along the column and decimated by two.
This operation splits the image into four bands namely one approximation band, which contains coarse information and three detail bands, horizontal, vertical, and diagonal, respectively, which contain information about the salient features of the image such as edges, points, and lines [5, 8]. A J-level decomposition can be performed resulting in different frequency bands. At each level of decomposition, the image is split into high and low frequency components; the low-frequency components can be further decomposed until the desired resolution is reached [13, 14, 15]. The pan-sharpening procedure for the pan sharpening of panchromatic (PAN) and multispectral (MS) images using DWT has been explained in Section 3.1 (Figure 1).
2.1.2 Stationary wavelet transform (SWT)
It is observed that discrete wavelet transform (DWT) is not a shift-invariant transform. Therefore, in order to get rid of this problem, stationary wavelet transform (SWT)-based fusion technique, an extension of DWT scheme, also known as “à trous” algorithm, has been introduced [10, 11]. In the “à trous” algorithm, the downsampling step is suppressed and instead the filter is upsampled by inserting zeros between the filter coefficients (Figure 2).
In the SWT algorithm, it uses a two-dimensional filter derived from the scaling function. This produces two images, of which one is an approximation image while the other is a detailed image called the wavelet plane. A wavelet plane represents the horizontal, vertical, and diagonal detail between and resolution and is computed as the difference between two consecutive approximations levels. All the approximation images obtained, by applying this decomposition, have the same number of columns and rows as the original image, since filters at each level are upsampled by inserting zeros between the filter coefficients and make the size of the image same [16, 17, 18, 19].
The procedure for the pan sharpening of PAN and MS images using SWT can be summarized as follows (Figure 3):
To generate new panchromatic images, match histograms of PAN image to their corresponding MS image.
Perform the second-level wavelet transform only on the modified PAN image.
The resulting wavelet planes of PAN are added directly to each MS images.
The SWT eliminates the shift sensitivity problem at the cost of an overcomplete signal representation. However, it does not resolve the problem of feature orientation. In addition, the discrete wavelet transform (DWT), and stationary wavelet transform (SWT), cannot capture curves and edges of images well. Wavelets perform well only at representing point singularities, i.e., appropriate to represent linear edges, since they ignore the geometric properties of structures and do not exploit the regularity of edges.
For curved edges, the accuracy of edge localization in the wavelet transform is low. So, there is a need for an alternative approach, which has the potential or capability to detect, represent, and process high-dimensional data. In order to solve this problem, multiscale geometric analysis has been further investigated. As a result, Candès and Donoho  have proposed the concept of curvelet transform (CVT).
Further, in order to solve the problem of curvelet transform, which is first developed in continuous domain and then does discretization of images or signals of interest, Yang et al.  and Do and Vetterli  presented a flexible multiresolution, local, and directional image expansion using contour segments, named contourlet transform. However, due to the downsampling and upsampling, the CT lacks shift invariance and thus results in ringing artifacts . To overcome the weakness of wavelets, curvelets, and contourlets, Cunha et al.  proposed non–subsampled contourlet transform (NSCT), based on non–subsampled pyramid decomposition (NSPD) and non–subsampled filter bank (NSFB).
2.1.3 Non–subsampled contourlet transform (NSCT) technique
In order to reduce the frequency aliasing of contourlets and enhance directional selectivity and shift invariance, Holschneider and Tchamitchian  proposed non–subsampled contourlet transform. This is based on the non–subsampled pyramid filter banks (NSPFBs) and the non–subsampled directional filter banks (NSDFBs) structure. The former provides multiscale decomposition using two-channel non–subsampled 2-D filter banks, while the later provides directional decomposition, i.e., it is used to split band pass subbands in each scale into different directions [25, 26].
As a result, NSCT is shift invariant and leads to have better frequency selectivity and regularity than CT [25, 26, 27, 28]. The scheme of NSCT structure is shown in Figure 4(a). The NSCT structure classifies two-dimensional frequency domain into wedge-shaped directional subband as shown in Figure 4(b).
In order to provide more practical and flexible solution to the existing problem as stated above, there is a need for an improved or a new fusion technique, which is superior among all the existing pan-sharpening techniques. A new pan-sharpening technique should ideally possess properties of shift invariance, directionality, low computational complexity, and low computational time, applicable to real-time image processing tool, and is also efficient in capturing intrinsic geometrical structures of the natural image along the smooth contours. Moreover, it should perform efficiently under all categories of datasets, such as very high, high, and medium resolution satellite datasets. A spatial frequency-based technique should ideally possess properties, such as shift invariance, directionality, low computational complexity, and low computational time, applicable to real-time image processing tool, and is also efficient in capturing intrinsic geometrical structures of the natural image along the smooth contours [27, 28]. Thus, in order to resolve the existing problems, pan-sharpening method based on joint spatial frequency domain such as pseudo-Wigner distribution has been introduced.
3. Spatial-frequency based pan-sharpening technique
Analysis of non-stationary 2-D signals (image) is a challenging job, as their spectral properties change with time. Such signals cannot be analyzed well by pure spatial domain and frequency domain representations. The joint spatial frequency domain-based image analysis methods, such as Wigner Ville distribution (WVD) and pseudo-Wigner distribution (PWD), have been proven to be a powerful tool for analyzing, understanding, and detection of spatial frequency characteristics of non-stationary images in a more comprehensive manner.
The use of Wigner Ville distribution for image processing was first suggested by . It was shown that WVD is a very efficient and powerful tool for capturing the essential non-stationary image structures  and appears as a new promising method for the characterization of local spectral properties of images. The Wigner Ville distribution has many interesting properties related to translation, modulation, scaling, convolution, and localization in spatial frequency space, real-valued function and contains phase information, which motivates its use in the field of image analysis applications. Since WVD suffers with the serious problem of interference that makes the interpretation impossible, thus to resolve the limitation of WVD, pseudo-Wigner distribution (PWD) was introduced.
3.1 Pseudo-Wigner Distribution (PWD) technique
Spatial frequency information of a non-stationary image can be effectively extracted with one of the well-known spatial frequency technique known as pseudo-Wigner distribution (PWD). PWD is ideally suited for representing a nonstationary image in the spatial frequency domain and is carried out by adapting the fast Fourier transform (FFT) algorithm. The significant properties of PWD motivate its use in the field of image processing, especially for the fusion of satellite images [30, 31]. These properties are as follows:
PWD provides a pixel-wise analysis, efficient and powerful tool for capturing the essential nonstationary image structures, as well as for the characterization of local spectral properties of images, which is indispensable for image fusion.
PWD is shift-invariant technique. Shift-invariant property is necessary for a high-quality and effective image fusion. In the absence of shift invariance, artifacts, such as aliasing effect, loss of linear continuity in spatial features, become prevalent in the resulting fused image (Figure 5).
Multidirection, i.e., the window can be tilted in any direction to obtain a directional distribution.
Computation time for PWD is generally small.
With reference to Table 1, pseudo-Wigner distribution (PWD) overcomes the shortcomings of the traditional Fourier-based methods, discrete wavelet transform (DWT), stationary wavelet transform (SWT), curvelet transform (CT), contourlet transform (CT), and non–subsampled contourlet transform (NSCT). Consequently, it is not based on a multiscale decomposition procedure as wavelets and contourlets are. Further, one of the most challenging applications that comes across by the remote sensing experts is to fuse MS and PAN images collected from different or same satellite sensor with each other to achieve a pan-sharpened image, without introducing artifacts or inconsistencies; otherwise it may damage the quality of the fused image.
|DWT||Poor directionality, lack of shift invariance|
|SWT||Limited directional selectivity|
|NSCT||Time-consuming, blocking artifacts|
Thus, the goal of pan-sharpening is to produce pan-sharpened images with the highest spectral fidelity possible, as the importance of such images in various applications, ranging from land use/land cover classification to road extraction. Therefore, preserving the spectral information of the original MS images in the pan-sharpened images is of great importance [31, 32, 33]. Therefore, an attempt to utilize the concept of pseudo-Wigner distribution (PWD) for the pan-sharpening of high-resolution PAN image with a low-resolution MS image has been introduced.
3.1.1 Mathematical background of pseudo-Wigner distribution
Let us consider an arbitrary 1-D discrete function . The PWD of a given array of pixels is given by Eq. (1).
where and represent the spatial and frequency discrete variables, respectively, and is a shifting parameter. Eq. (1) can be interpreted as the discrete Fourier transform (DFT) of the product . Here, indicates the complex conjugate of 1-D sequence, . is a matrix where every row represents the pixel-wise PWD of the pixel at position . Further, is a 1-D sequence of data from the image, containing the gray values of N pixels, aligned in the desired direction. By scanning the image with a 1-D window of pixels, i.e., shifting the window to all possible positions over the full image, the full pixel-wise PWD of the image is produced. The window can be tilted in any direction to obtain a directional distribution [34, 35]. Further, the reasons for selecting short 1-D window for PWD analysis are as follows:
It greatly decreases the computational cost.
It allows to obtain a pixel-wise spectral analysis of the data.
Coregister both the source images and resample the multispectral image to make its pixel size equal to that of the PAN, in order to avoid the problem of misregistration.
Apply DWT/NSCT/PWD to all input coregistered images, one by one, to get their respective coefficients according to the mathematical decomposition procedure related to each one of the techniques, along with upsampling and histogram matching.
The obtained coefficients generated in step (i) from the different input images, i.e., MS and histogram-matched PAN image, are combined according to defined fusion rules to get the fused coefficients.
The fused coefficients are subject to an inverse DWT/NSCT/PWD to construct the fused image. As a result, a new MS image with higher spatial resolution is obtained.
As a result, a new multispectral image with higher spatial resolution is obtained. This process is repeated for each individual MS and PAN band pair. Finally, all the new fused bands are concatenated to form a new fused multispectral image.
It may be noted that each MST technique (DWT, NSCT, and PWD) has its unique mathematical properties, which leads to different image decomposition procedure of an image.
3.2 Comparative assessment of various pan-sharpening techniques
Pan-sharpening techniques, belonging to color, statistical, and multiscale transform-based techniques, have been evaluated in terms of certain parameters, such as spectral distortion, shift invariance, directionality, and computational complexity. Comparative assessment of various pan-sharpening techniques has been shown in Table 2.
|Grade||Absolute measure||Relative measure|
|1||Excellent||The best in group|
|2||Good||Better than the average in group|
|3||Average||Average level in group|
|4||Poor||Lower than the average level|
|5||Very poor||The lowest in the group|
4. Fusion rules
There are various fusion rules to combine the fused coefficients. Let and denote the coefficients for higher spatial resolution PAN image and for the lower spatial resolution MS image, and denotes the coefficient of the fused image. Using these notations, following fusion rules can be summarized as follows:
Average fusion rule
The average fusion rule takes the average of the coefficients of the , PAN, and , MS images, which is given by Eq. (2).E2
Maximum fusion rule
The maximum fusion rule compares the coefficients from the , PAN, and , MS images, and picks the larger magnitudes as the fused coefficients, which is given by Eq. (3).
5. Assessment of accuracy for pan-sharpening techniques
Pan-sharpening algorithms are designed to produce good-quality pan-sharpened images. A fused image would be considered perfect quality if the spatial detail missing in the MS image is transferred from the panchromatic image without distorting the spectral content of the multispectral image . Unfortunately, this is not possible. There is a trade-off between enhancement of spatial detail and spectral distortion. A fully spatially enhanced fused image would be the panchromatic (PAN) image itself, while an image free of spectral distortion would be the original multispectral (MS) image .
The diversity of datasets has contributed to the development of different types of techniques and procedures for the implementation of image fusion. In order to benchmark different pan-sharpening techniques, image quality metrics have been used, i.e., quality metrics are required to evaluate the quality of the fused images [37, 38]. There are two types of metrics used to evaluate image quality:
5.1 Qualitative evaluation
Qualitative analysis deals with the visual comparison of the original PAN and MS images with that of the fused image, in terms of spectral and spatial distortion. The evaluation results vary depending on the intensity, sharpness, existence of noisy areas, missing spatial detail, and distortions in the geometry of the objects and display conditions of the image. A number of viewers will be shown the images and asked to judge the image quality. These may also vary from observer to observer, i.e., interpretation of image quality may be influenced or varied by personal preference [39, 40]. Therefore, an exact decision cannot be given. Further, these methods are time-consuming, inconvenient, and expensive.
5.2 Quantitative evaluation metrics
It is evident that, in most cases, there is slight difference among fusion results, i.e., quantitative evaluation methods sometimes produce results that cannot be sustained by visual inspection. However, there is no universally accepted metric to objectively evaluate the image fusion results. The generated pan-sharpened images are compared from diverse perspectives of image visualization, coherence, structural similarity, and spectral information content.
The well-known full-reference objective metrics are correlation coefficient (CC), root mean square error, peak signal-to-noise ratio . The reason behind selecting these evaluation indicators is that they measure the statistical, structural similarity, and spectral distortion introduced by the pan-sharpening process. The quantitative metrics that are used in this study, as well as the mathematical representation of these measures, have been discussed below.
5.2.1 Root mean square error
Root mean square error (RMSE) is a frequently used measure of the differences between the fused and the original images. RMSE is a good measure of accuracy . Smaller RMSE value represents a greater accuracy measure and is explained by Eq. (4).
where indicates size of the image and indicate the fused image and the original image, respectively.
5.2.2 Peak signal-to-noise ratio
Peak signal-to-noise ratio (PSNR) indices reveal that the radiometric distortion of the fused image is compared to the original image. PSNR can reflect the quality of reconstruction. The larger value of PSNR indicates less amount of image distortion  and is given by Eq. (5).
where L is related to the radiometric resolution of the sensor; for example, L is 255 for an 8-bit sensor and 2047 for a 16-bit sensor.
5.2.3 Correlation coefficient
The correlation coefficient (CC) of two images is often used to indicate their degree of correlation. If the correlation coefficient of two images approaches one, it indicates that the fused image and original image match perfectly [40, 41]. High value of the correlation shows that the spectral characteristic of the multispectral image has been preserved well. The correlation coefficient is represented by Eq. (6)
where and are the elements of the images x and y, respectively, and and stand for their mean values.
5.2.4 Spatial correlation coefficient
In order to assess the spatial quality of the fused image quantitatively, procedure proposed by  has been adopted. This approach is used to measure the amount of edge information from the PAN image, which is transferred into the fused images. The high spatial resolution information missing in the MS image is present in the high frequencies of the PAN image. The pan-sharpening process inserts the higher frequencies from the PAN image into the MS image. Therefore, the CC between the high pass filtered PAN and the fused images would indicate how much spatial information from the PAN image has been incorporated into the MS image. A higher correlation between the two high pass filtered images implies that the spatial information has been retained faithfully. This CC is called the spatial correlation coefficient (SCC). In order to extract the spatial detail of the images to be compared, following Laplacian filter has been used and is represented by Eq. (7).
The pan-sharpened image which will best preserve the spectral and structural information of the original low resolution MS image is the one that has satisfied the following conditions (Table 3).
|Metric||Ideal value||Error value|
|Root mean square error (RMSE)||0|
|Peak signal-to-noise ratio (PSNR)||NA|
|Correlation coefficient (CC)||1||and|
|Spatial correlation coefficient (SCC)||1||and|
This chapter provides the methodology of the proposed approaches for the pan-sharpening of satellite images, along with the discussion of some prevalent existing multisensor pan-sharpening techniques and well-known evaluation indicators.