Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network - ResNet Fusion

K. Priya; K.K. Rajkumar

doi:10.5772/intechopen.105455

Abstract

In recent years, deep learning HS-MS fusion has become a very active research tool for the super resolution of hyperspectral image. The deep conventional neural networks (CNN) help to extract more detailed spectral and spatial features from the hyperspectral image. In CNN, each convolution layer takes the input from the previous layer which may cause the problems of information loss as the depth of the network increases. This loss of information causes vanishing gradient problems, particularly in the case of very high-resolution images. To overcome this problem in this work we propose a novel HS–MS ResNet fusion architecture with help of skip connection. The ResNet fusion architecture contains residual block with different stacked convolution layer, in this work we tested the residual block with two-, three-, and four- stacked convolution layers. To strengthens the gradients and for decreases negative effects from gradient vanishing, we implemented ResNet fusion architecture with different skip connections like short, long, and dense skip connection. We measure the strength and superiority of our ResNet fusion method against traditional methods by using four public datasets using standard quality measures and found that our method shows outstanding performance than all other compared methods.

Keywords

convolution neural network
residual network
ResNet fusion
stacked layer
dense skip connection

Author Information

Show +

K. Priya*
- Department of Information Technology, Kannur University, Kerala, India
K.K. Rajkumar
- Department of Information Technology, Kannur University, Kerala, India

*Address all correspondence to: kodothpriya@gmail.com

1. Introduction

Spectral imaging technology captures contiguous spectrum for each image pixel over a selected range of wavelength bands in the spectrum. Thus, spectral images accommodate more information than conventional monochromatic or RGB images. The wide range of spectral information available in hyperspectral image brings the spectral imaging technology into a new horizon of research for analyzing the pixel content at macroscopic level. This tremendous change in image processing research area makes revolutionary developments in every walks of human life in coming future. In general, spectral images are divided into either Multispectral (<20 numbers of wavelength bands sampled) or Hyperspectral (>20 numbers of wavelength bands). Multispectral image (MSI) captures a maximum of 20 spectral bands whereas Hyperspectral image (HSI) captures hundreds of contiguous spectral bands at a time. Due to this exciting prominence, HSI is now becoming an emerging area and at the same time faces a lot of challenges to analyze the minute details of the pixel content in image processing and computer vision areas [1].

Hyperspectral images (HSIs) are rich in spectral information that highly strengthens their information storing ability. This property of HSI is enable rapid growth in the development in many areas such as remote sensing, medical science, food industry, and various computer vision tasks. However, hyperspectral images capture all these bands in a narrow wavelength range, and hence it limits the amount of energy received by each band. Therefore, the HSI information can be easily influenced by many kinds of noises, and it leads to lower the spatial resolution of HSI [2].

Many studies have been introduced in literature so far to control the tradeoff between the spatial and spectral resolution in the hyperspectral images. As a result of this, many HS–MS fusion methods are evolved in the past decades to address it. The straightforward approach of the HS–MS fusion method has become the most popular and trending research area of image processing and computer vision. The early approach is pansharpening-based image fusion that fuses spectral and spatial information from low resolution multispectral (LR–MS) images with high resolution (HR) panchromatic (PAN) images to enhance the spatial and spectral resolution of the fused image. Subsequently, pansharpening image fusion algorithms have been gradually extended to HS-MS image fusion [3].

In HS–MS fusion, a high spatial and spectral hyperspectral image is estimated by fusing LR–HS image with HR–MS image of the same scene. However, the estimated spatial and spectral data quality is highly influenced by the constraints used in the fusion process. Recently, neural network-based methods have been widely used in many areas to improve the HS–MS fusion quality in both spatial and spectral domains. One such network named as convolution neural network (CNN) in deep learning (DL) performs much better in image reconstruction, super-resolution, object detection, etc. [4].

In CNN, each layer takes the output from the previous layer, which tends to lose information as the network goes into deeper architecture. In this work, we use ResNet-based HS–MS fusion by adding the skip connection between the convolution layers. This skip connection helps to map the identity of information throughout the deep convolution network [5].

The following sections of this paper are arranged as Section 2 includes various literature reviews of HS–MS fusion methods in both traditional and newly introduced deep learning methods. Section 3 includes the materials and methods used in this work. Sections 4 and 5 includes the detailed representation of problem formulation and implementation of our work. The results and discussion of our proposed method are discussed in Section 6, and finally, Section 7 concludes the proposed work with future scope.

2. Review of literature

2.1 Traditional methods

Many algorithms have been proposed to enhance the spatial quality of HS images in past decades. One such popular and attractive method is HS-MS image fusion, which is mainly divided into four groups: component substitution (CS), multi-resolution analysis (MRA), Bayesian approach, and spectral unmixing (SU) [6]. The CS and MRA methods are described under the concept of an injection framework. In this framework, the high-quality information from one image is injected into another [7]. Apart from these, Bayesian-based methods use probability or posterior distribution of prior information about the target image. The posterior distribution of the target image is considered based on the given HS and MS images [8]. Later, spectral unmixing-based HS–MS image fusion was introduced and is one of the promising and widely used methods for enhancing the quality of HS image.

In SU method, the quality of the abundance estimation highly depends on the accuracy of the endmembers. Therefore, any obstruction that occurs during the end member extraction process leads to inconsistency in the abundance estimation. To overcome this limitation, Paatero and Tapper in 1994 [9] introduced nonnegative matrix factorization (NMF) method and it was popularized in article by Lee and Seung in 1999 [10]. It has become an emerging tool for processing high-dimensional data due to the automatic feature extraction capability. The main advantage of this NMF method is that it shows a unique solution to the problem compared to other unmixing techniques [11]. In general, NMF based on the spectral unmixing jointly estimates both endmember and corresponding fractional abundance in a single step are mathematically represented as follows,

Y=EAE1

Where the output matrix Y is simultaneously factorized into two nonnegative matrix E (endmember) and A (abundance) without any prior knowledge and hence NMF comes under an unsupervised framework [12]. Later NMF is one of the trending methods for blind source spectral unmixing problems. NMF factorizes the input matrix into a product of two nonnegative matrices (endmember matrix, E and abundance matrix, A) by enforcing nonnegativity. So NMF method has high relevance in SU to enhance the quality of the image by adding these constraints. Finally, SU-based fusion is accomplished by using coupled NMF (CNMF) method to obtain enhanced hyperspectral image with high spatial and spectral quality. The CNMF fusion algorithm gives high-fidelity reconstructed image compared to other existing fusion methods [13].

Yokoya et al. in 2012 [14] introduced a coupled non-negative matrix factorization (CNMF) method, which is an unsupervised unmixing-based HS-MS image fusion. CNMF uses a straightforward approach to unmixing and fusion processes, so its mathematical formulation and implementation are not as complex as other existing fusion methods. Finally, this method optimizes the solution with minimum residual errors and reconstructs the high-fidelity hyperspectral image.

Simoes et al. in 2015 [15] introduced a super-resolution method for hyperspectral image termed as HySure. This method formulated a new model to preserve the edges between the objects during the unmixing-based data fusion. This method uses an edge-preserving constraint called vector total variation (VTV) regularizer that preserves the edges and promotes piecewise smoothness to the spatial quality of the image.

Lin et al. in 2018 [16] introduced a convex optimization-based CNMF (CO-CNMF) method. This method is proposed by incorporating sparsity and sum-of-squared-distances (SSD) regularizer. To extract high-quality data from the images, this method uses an SSD regularizer and provides sparsity by using ℓ1 -norm regularization. By adding these two regularization terms with two convex subproblems helps to upgrade the performance of the existing CNMF method. However, sometimes performance degradation may occur in the CO-CNMF algorithm as the noise level increases. Therefore, it is necessary to add image denoising and spatial smoothing constraints with this fusion method.

Yang et al. in 2019 [17] introduced a total variation and signature-based (TVSR) regularizations CNMF method named as TVSR-CNMF. The TV regularizer is added to the abundance matrix to ensure the images spatial smoothness. Similarly, a signature-based regularizer (SR) is added with the endmember matrix for extracting high-quality spectral data. So, this method helps to reconstruct a hyperspectral image with good quality in spatial and spectral data.

Yang et al. in 2019 [18] introduced a sparsity and proximal minimum-volume regularized CNMF method named as SPR-CNMF. The minimum-volume regularizer controls and minimizes the distance between selected endmembers and the center of mass of the selected region in the image to reduce the computational complexity. It redefines the fusion method at each iteration until reaches the simplex with minimum volume. This method improves the fusion performance by controlling the loss of cubic structural information.

After being influenced by this work, we implemented an unmixing-based fusion algorithm named fully constrained CNMF (FC-CNMF). This method is a modified version of CNMF by including all spatial and spectral constraints available in the literature. In our method, a simplex with minimum volume constraint is imposed with the endmember matrix to exploit the spectral information fully. Similarly, sparsity and total variation constraints are incorporated with the abundance matrix to provide dimensionality reduction and spatial smoothness to the image. Finally, we evaluated the quality of the fused image obtained by FC-CNMF against the methods discussed in the literature using some standard quality measures. From these evaluations, we understood that our method shows better performance by yielding high-fidelity in the reconstructed images.

These traditional approaches reconstruct the high-resolution hyperspectral image by fusing the high-quality data from hyperspectral and multispectral images. However, to improve the quality of the reconstructed images, these approaches use different constraints such as sparsity, minimum volume simplex, and total variance regularization, etc. The performance and quality of the reconstructed HS image are highly influenced by these constraints and therefore our existing methods still have an ample space to enhance the quality of HSI.

2.2 Deep learning methods

Deep learning (DL) is a subbranch in machine learning (ML) and has shown remarkable performances in the research field, especially in the area of image processing and computer vision recently. DL is based on an artificial neural network that has been widely used in different areas such as super-resolution, classification, image fusion, object detection, etc. DL-based image fusion methods have the ability to extract deep features automatically from the image. Therefore, DL-based methods overcome the difficulties that are faced during the conventional image fusions methods and make the whole fusion process as easier and simple.

A deep learning-based HS-MS image fusion concept was first introduced by Palsson et. al in 2017 [19]. In this method, they used a 3-D convolutional neural network (3D-CNN) to fuse LR–HS and HR–MS image to construct HR-HS image. This method improves the quality of hyperspectral image by reducing noise and the computational cost. In this paper, they focused on enhancing the spatial data of LR–HS image without any changes in the spectral information and it caused the degradation of spectral data [19].

Later, Masi et al. in 2017 [20] proposed a CNN-architecture for image super-resolution, which uses deep CNN for extracting both spatial and spectral features. Deep CNN is used to acquire features from HSI with a very complex spatial-spectral structure. But in this paper, authors used deep CNN with single branch CNN architecture which is difficult to extract the discriminating features from the image.

To overcome this drawback, Shao and Cai in 2018 [21] designed a fusion method by extending CNN with depth of 3D-CNN for obtaining better performance while fusion. For implementing this, they used a remote sensing image fusion neural network (RSIFNN) that uses two CNN branches separately. One branch extract the spectral and the other extract the spatial data from the image. In this way, this method helps to exploit the spectral as well as spatial information from the input images to reconstruct high spectral and spatial resolution hyperspectral image.

Yang et.al in 2019 [22] introduced a deep two-branch CNN for HS–MS fusion. This method uses a two-branch CNN architecture for extracting spectral and spatial features from LR–HSI and HR–MSI. These extracted features from two branches of CNN are concatenated and then passed to the fully connected convolution layer to obtain HR–HSI. In all the conventional fusion methods, HR–HSI is reconstructed in a band-by-band fashion whereas in CNN concepts all bands are reconstructed jointly. Therefore, it helps to reduce the spectral distortion that occurs in the fused image. But this method uses fully connected layer for image reconstruction that is heavily weighted layer and it increases the network parameters.

Chen et al in 2020 [23], introduced a spectral–spatial features extraction fusion-CNN (S2FEF- CNN) which extracts joint spectral and spatial features by using three -S2FEF blocks. The S2FEF method use 1D and 2D convolution network to extract spectral and spatial features and fuse these spectral and spatial features. This method uses fully connected network layer for dimensionality reduction, and it further reduces the network parameters during the fusion. This method shows good results with less computational complexity compared to all other deep learning-based fusion method.

Although the deep learning-based fusion methods achieved tremendous improvement in their implementation, however, all these methods still possess many drawbacks [24]. As the network goes deeper, its performance gets saturated and then rapidly degrades. This is because, in DL method, each convolution layer takes inputs from the output of the previous layers, so when it reaches the last layer, a lot of meaningful information obtained from the initial layers will be lost. The information loss tends to get worse when the network is going deeper in architecture. This will bring some negative effects such as overfitting of data and this effect is called vanishing gradient problem [25].

Due to the vanishing gradient problem, the existing deep learning-based fusion could not be able to extract the detailed features from high dimensional images. He et al in ref., [26], introduced a deep network with residual learning to address the vanishing gradient problem. In this framework, a residual block is added between the layers to diminish the performance degradation. The networks with these concepts are called residual networks or ResNets. Therefore, in this work, our aim is to invoke this ResNet architecture into the standard CNN to exploit more detailed features from both spatial and spectral data of HSI.

3. Materials and methods

3.1 Dataset

The four real datasets such as Washington DC mall, Botswana, Pavia University, and Indian Pines are used in this work. The Washington DC Mall dataset is a well-known dataset captured by HYDICE sensor, which acquired a spectral range from 400 to 2500 nm have 1278×307 pixel size and 191 bands. The Botswana dataset which is captured by Hyperion sensor acquired over the Okavango delta in Botswana, which acquired a spectral range from 400 to 2500 nm with 1476 × 256 pixel size and 145 bands. The Pavia University dataset was captured by the reflective optics spectrographic imaging system (ROSIS-3) at the University of Pavia, northern Italy, in 2003. It has a spectral range from 430 to 838 nm and has a 610 × 340 pixel size and 103 bands. Finally, the dataset AVIRIS Indian Pines was captured by AVIRIS sensor over the Indian Pines test site in northwestern Indiana, USA, in 1992. It acquired a spectral range from 4 to 2500 μm having 512 × 614 pixel size and 192 bands [26]. All these datasets have been widely used in earlier spectral unmixing-based fusion research.

3.2 Convolution neural networks

Convolutional neural networks (CNN) have an important role in deep learning models. CNN specially built an algorithm that is designed to work with images to extract deep features from the image through convolution. The convolution is a process that applies a kernel filter across every element of an image to understand and react to each element within the images. This concept of convolution is more helpful during the extraction of specific features from high dimensional images. A convolutional network architecture is composed of an input layer, an output layer, and one or more hidden layers. The hidden layers are combination of convolution layers, pooling layers, activation layers, and normalization layers. These layers automatically detect essential features without any human supervision. So it is considered as a powerful tool for image processing [27].

Convolution layer
The convolution layer is used to extract various features from the input image with the help of filters. In convolution layer, mathematical operation is performed between the input image and the filter with m × m kernel size. This filter is sliding across the input image to calculate the dot product of the filter and part of the image. This process is repeated for convolving the kernel to all over the image and the output of the convolution operation is called a feature map. This feature map includes all essential information about the image such as the boundary and edges of objects etc. [28].
Pooling layer
The convolution layer is followed by a pooling layer, which reduces the size of the feature map by maintaining all the essential features. There are two types of pooling layers such as max pooling and average pooling. In Max pooling, the largest element is taken from the feature map whereas in the average pooling calculates the average of the element in the feature map [28].
Activation function
One of the most important characteristics of any CNN is its activation function. There are several activation functions such as sigmoid, tanH, softmax, and ReLU, and all these functions have their own importance. The ReLU is the most commonly used activation function in DL that accounts for the nonlinear nature of the input data [28].

3.3 Residual network (ResNet)

A residual network is formed by stacking several residual blocks together. Each residual block consists of convolution layers, batch normalization, and activation layers. The batch normalization process the data and brings numerical stability by using some scaling techniques without distorting the structure of the data. The activation layer is added into the residual network to help the neural network to learn more complex data. The CNN or deep learning method uses ReLU (rectified linear unit) function in the activation layer to accommodate the nonlinearity nature of the image data while providing the output. The residual blocks allow to flow information from the first layer to the last layers of the network by adding residual or skip connection strategy. Therefore, ResNet can effectively utilize features of the input data to the output of the network and thus alleviate gradient vanishing problems.

Let x be the input to the residual block, after processing the information x with two-stacked convolution layers of a residual unit, obtains F(W₁x), where W is the weight given to the convolution layer. In ResNet, before giving an output of one convolution layer F(W₁x) as input of the next layer by adding the x term, which is the input parameters of previous residual block, to provide an additional identity mapping information called as skip connection. Therefore the general formulation of a residual block can be represented as follows:

y=FWix+xE2

Here x is an input and y is the output of the residual unit. Then y is a guaranteed input to the next residual block. The function F(W_i x) represents the output of each convolution layer, and W_i is the weight associated with ith residual blocks. Figure 1 uses two convolution layers for the residual unit, so the output from this residual layer can be written as:

FxW=W2ReLUW1xE3

Where ReLU represents the nonlinear activation function rectified linear unit (ReLU), W1 and W2 are the weight associated with convolution layers 1 and 2 of the residual block A. Deep residual networks consist of many stacked residual blocks and each block can be formulated in general as follows:

xi+1=FxlWl+xiE4

Where F is the output from residual block with l stacked convolution layer and xi is the residual connection to the ith residual block, then xi+1 become the output of the ith residual block, which is calculated by a skip connection and element-wise multiplication. After passing through the ReLU activation layer, the output residual network can be represented as:

y=ReLUxi+1E5

4. Problem formulation

A high-resolution hyperspectral image Z∈ℝL×N with L spectral band and N pixels. The observed LR–HSI is obtained by downsampling the spatial quality of Z with a gaussian blur factor d is represented as Yh∈ℝL×N/d with L bands and N/d pixels. Similarly, the observed HR–MSI is obtained by downsampling the spectral quality of Z and it is represented as Ym∈ℝLm×N with Lm bands and N pixels, where Lm< L [27]. Therefore, the hyperspectral image can be mathematically modeled as:

Z=EA+RE6

Where, Z is the original referenced images, E and A are the endmember, abundance matrices, and R is the residual matrix respectively.

The observed YhandYm are spectrally and spatially degraded versions of image Z is further mathematically represented by:

Ym≈SZ+RmE7

Yh≈ZB+RhE8

Where B∈ℝN×N/d is a Gaussian blur filter with blurring factor d used to blur the spatial quality of the referenced hyperspectral image Z to obtain LR–HSI, Yh. The spectral response function, S∈ℝLm×L is used to downsampling the spectral quality of the referenced hyperspectral image Z to obtain HR–MSI, Ym. The term Lmmeans the number of spectral bands used in the multispectral image after downsampling. In this work, referenced image Z is downsampled by its spectral values using standard L and sat 7 multispectral image that contains a high-quality visual image of Earth’s surface as HR–MSI with Lm=7 [28]. Both B and S are spared matrices containing zeros and ones. In general, the residual matrix RmandRh are assumed as zero-mean Gaussian noises in the literature, Therefore, the original CNMF method is shown as:

CNMFEA=Yh−EAhF2+Ym−EmAF2E9

However, in this work, we make use of the residual term RmandRh as a nonnegative residual matrix to account for the nonlinearity effects in the image fusion [29]. Since the objective function for the original CNMF method expressed in the Eq. (9) can be re-written as:

CNMFEAR=Yh−EAh+RhF2+Ym−EmA+RmF2E10

Therefore the Eq. (10) represents the proposed model of the HS–MS fusion by including the nonlinearity nature of the image. To implement this model, we use standard deep neural network architecture CNN and ResNet. For further enhancement of the proposed method, we implemented modified architecture of ResNet with different stacked layers and multiple skip connections.

5. Problem implementation

5.1 CNN fusion architecture

In CNN architecture, 1D CNN convolution operation is performed over the observed HS image Y_h of dimension L_hxN_h with L_h spectral band and N_h number of pixels in the image with the help of filter to obtain the spectral data. In the same way, 2D CNN convolution operation is performed over the observed MS image is denoted by Y_m of dimension L_m x N_m, with L_m spectral bands and N_m number of pixels in the image to obtain the spatial data. Finally, the high spectral component obtained from Yh and high spatial component obtained from Ym are fused together to reconstruct a high HR-HSI. The entire deep neural network-based HS–MS fusion is shown in Figure 1.

In CNN architecture, the Conv1D() convolution filter with kernel size r having weight v are used for extracting spectral data from LR–HSI, Y_h are represented as follows:

fspec=Conv1DReLUFviYhE11

Similarly, the Conv2D() convolution filter with kernel size r × r having weight w are used for extracting spatial data from HR–MSI, Y_m image are represented as:

fspat=Conv2DReLUFwijYmE12

The two convolutional layers use ReLU (rectified linear unit) activation functions, i.e., ReLU (x) = max(x, 0), to provide nonlinear mapping of data. Finally, fuse the extracted spatial and spectral features to get high-quality reconstructed image as shown in Eq. (4).

F=ReLUfspec×fspatE13

To implement this CNN fusion architecture, we use two convolution networks such as 1D and 2D convolution. Both 1D and 2D convolution uses the same number of convolution layers and kernel size. Each network uses four convolution layers with 32, 64, 128, and 256 filters. The kernel size of 3 × 3 and 1 × 3 are used for 2D CNN and 1D CNN for extracting spatial and spectral information about the image. Therefore, the architecture and parameters of CNN HS-MS fusion are shown in Table 1.

Layer		Filter	Kernel size	Stride	Padding	Activation
Conv 1	Conv 1D	32	1 × 3	1	Same	ReLU
Conv 1	Conv 2D	32	3 × 3	1	Same	ReLU
Conv 2	Conv 1D	64	1 × 3	1	Same	ReLU
Conv 2	Conv 2D	64	3 × 3	1	Same	ReLU
Conv 3	Conv 1D	128	1 × 3	1	Same	ReLU
Conv 3	Conv 2D	128	3 × 3	1	Same	ReLU
Conv 4	Conv 1D	256	1 × 3	1	Same	ReLU
Conv 4	Conv 2D	256	3 × 3	1	Same	ReLU
Output layer	Conv 1D	1	1 × 1	1	Same	ReLU
Output layer	Conv 2D	1	1 × 1	1	Same	ReLU

Table 1.

The Simple CNN Fusion Architecture.

In CNN, each layer takes its input as the output from the previous layer and it introduces lose information as the network architecture goes in deeper. This problem in deep neural network leads to overfitting of data, and it is known as vanishing gradient problem [24]. To overcome this, we implemented HS-MS fusion using an alternative ResNet-based network architecture. In ResNet, we introduced the skip connection between two convolution layers. This skip connection helps to map the identity of information throughout the deep convolution network.

5.2 Resnet fusion architecture

The ResNet fusion architecture for HS–MS fusion uses residual or skip connection which helps to improve the feature extraction capability from the images. For implementation, we use 1D ResNet to extract the spectral features from the LR–HSI and 2D ResNet for extracting spatial features from HR–MSI. Both 1D and 2D ResNet architecture consists of three residual blocks each having two convolutional layers and 64 filters as shown in Figure 2. A3×3 kernel size for 2D Resnet and 1×3 kernel size for 1D Resnet are used for extracting the spatial and spectral data from MSI and HSI. Each residual block has ReLU activation layer to accommodate the nonlinearity constraints included in the proposed hyperspectral image fusion model as explained in Eq. (10). Finally, the feature embedding and image reconstruction process are performed using another 2D CNN.

Figure 2.
Residual block with two stacked layer.

Spectral generative network
The spectral data from hyperspectral image Y_h is extracted using 1D ResNet connection. Initially, spectral data are extracted from LR–HSI using 1D CNN and then mapping the residual connection r(Y_h) with the stacked convolution layers. Finally, the output from ID CNN and r(Y_h) are given to the input of the next residual block and this process is repeated for an entire residual block in the ResNet. The entire process in 1D ResNet is shown mathematically as:
fYhl=ReLUWlYhlE14
fspecYhl=fYhl+rYhlE15
Therefore, output of i^th residual block is represented as:
fspeci=fspeci−1Yhl+ri−1YhlE16
Where, Yh denotes the input LR- HSI data, i is the number of residual units i = 1,2,3…..I and l are the number of convolution layer l = 1,2,3…..l. The weight of convolution kernel is represented as W. Finally, ReLU an activation functions are exploited to introduce nonlinearities in the output of deep network as follows:
Fspec=ReLUfspecE17
Spatial generative network
The spatial data from HR–MSI, Y_m is extracted using 2D ResNet. Initially, spatial data are extracted from HR–MSI using 2D CNN and then mapping the residual connection r(Y_m) with the stacked convolution layers. Finally, the output from 2D CNN to r(Y_m) is given to the input of the next residual block and this process is repeated for an entire residual block in the ResNet. The entire process in 2D ResNet is shown mathematically as:
fYml=ReLUWlYmlE18
fspatYml=fYml+rYmlE19
Therefore, output of the i^th residual block is represented as:
fspati=fspati−1Yml+ri−1YmlE20
Where, Ym denotes the input HR- MSI data, i is the number of residual blocks i = 1,2,3…..I and l are the number of convolution layer l = 1,2,3…..L. The weight of the convolution kernel is represented as W. Finally, similar to spectral extraction ReLU is exploited to introduce nonlinearities in the spatial output of a deep network as follows:
Fspat=ReLUfspatE21
Fusion of spectral-spatial data
The spectral data from LR–HSI and spatial data from HR–MSI are extracted using ResNet with size as (1x1x Spec) and (Spat x Spat x 1). After obtaining the spatial and spectral features, next step is to fuse this information by element-wise multiplication.
ϜZ=ϜspecxϜspatE22
Then, the feature embedding and image reconstruction are performed by using ReLU activation layer. The proposed ResNet Fusion framework is shown in Figure 3. Therefore, the final generated HR-HSI, Z can be written as:
Z=ReLUϜZE23
Different stacked layers and skip connection
We also proposed an extension to the ResNet fusion architecture by varying the number of stacked convolution layers (2 to 4) in the residual block to increase the performance of the fusion using deep network. The 2-layer residual block contains two stacked convolution l ayer followed by ReLU activation layer. Similarly, in three-layer and four-layer residual blocks contain three and four-stacked convolution layers followed by ReLU activation layer. In addition to this, we utilize the ResNet fusion architecture by including different skip connections. The skip connection helps us to regulate the flow of information to a deeper network more effectively. For this, we use long skip and dense skip connections as shown in Figure 4. The long skip connections are designed by creating a connection between alternate residual layer i^th and (i + 2)^th along with a short skip connection between every layer in the ResNet. In dense skip connection, each layer i obtain an additional input from all the preceding layers. Then, the layer i pass its own feature maps to all the subsequent layers. Using the dense skip connection, each layer in the ResNet receives feature maps from all the preceding layers and that limits the number of filters and network parameters for extracting deep features. In order to obtain high fidelity reconstructed image, we proposed a modified version of ResNet with long and dense skip connections shown in Figure 4.

Figure 3.
The framework of the proposed ResNet Fusion architecture.

Figure 4.
Representation of short, long, and dense skip connection on ResNet.

In the Figure 4 show three Resnet architecture, having three- residual blocks (Res Block), with three different types of skip connections. Algorithm 1 summarizes the procedures of our proposed ResNet fusion method.

Algorithm 1: Resnet Fusion

Input: LR-Hyperspectral image Y_h and HR-Multispectral image Y_mbegin

Extract spectral features from Y_h and spatial features from Y_m using ResNet
r(Y_h) ← Y_h and r(Y_m) ← Y_m
For each residual block in ResNet i = 1,2,3….I # for each residual block
for each convolution layer l in the residual block l = 2,3,4 # for stacked convolution layer
fYhl=ReLUWlYhl
fYml=ReLUWlYml
end for
# add the residual connection
fspecYhl=fYhl + rYhl
fspatYml=fYml + rYml
r(Y_h) ← fspecYhl
r(Y_m) ← fspatYml
end for
The extracted spectral features Ϝspec of size (1x1x Spec) and spatial features Ϝspat of size (Spat x Spat x1) are fused together by element-wise multiplication.
ϜZ = ϜspecxϜspat
Finally, generated HR-HSI after feature embedding and image reconstruction using relu activation layer.
Z = ReLUϜZ

End

Output: HR- Hyperspectral image, Z

6. Results and discussion

In this paper, intially we implemented CNN-based fusion by extracting the spectral data from LR–HSI using 1D convolution network and spectral data from HR–MSI using 2D convolution network. These extracted spatial and spectral features are then fused together to obtain HR–HSI. To extract more detailed features from HS and MS, it requires deep CNN architecture. As CNN architecture become deeper, it introduced vanishing gradient problem. To overcome this, we implemented an unsupervised ResNet Fusion network by using skip connections. The proposed ResNet fusion inherits all the advantages of standard CNN. In addition to this, ResNet allows the designing of a deeper network without any performance degradation during the feature extraction process. Therefore, the proposed ResNet Fusion architecture extracts more discriminative features from both HSI and MSI and finally reconstruct a high-resolution HSI by fusing these extracted high-quality features from the ResNet.

The performance of CNN and ResNet fusion method is evaluated on four benchmark data sets using standard quality measures namely SAM, ERGAS, PSNR, and UIQI [30]. Further, we also compared the performance of CNN and ResNet fusion against the baseline fusion methods namely, CNMF [9], FCN-CNMF, and S2FEF- CNN [22]. Out of these, CNN shows better performance compared to CNMF and FCN-CNMF. The ResNet-based fusion shows outstanding performance compared to all other methods including CNN. The results obtained by CNN and ResNet fusion method against the baseline methods on four benchmark datasets are shown in Table 2. The low SAM indicates the good spectral data in the fused image and low ERGAS shows the statistical quality of the reconstructed image. The high PSNR and UIQI show good spatial quality and high fidelity reconstructed image with less spectral distortion. From Table 2, it is further clear that good spectral preservation is obtained in Botswana dataset on analyzing the SAM value, which is reduced by more than 0.02 dB. Simultaneously, significant spatial preservation is achieved in the Indian Pine database revealed by the PSNR value increased by 1.5 dB.

Dataset	Methods	CNMF	FC-CNMF	CNN	S2FEF-CNN	ResNet
Pavia university	SAM	0.0633	0.0652	0.0451	0.0441	0.0409
	EARGAS	0.5423	0.4502	0.4311	0.4901	0.4029
	PSNR	64.4502	64.8923	65.1299	64.4915	66.1127
	UIQI	0.8779	0.9316	0.9262	0.9665	0.9872
Indian pines	SAM	0.5113	0.3976	0.4525	0.4118	0.3896
	EARGAS	0.8733	0.6991	0.6434	0.7192	0.6170
	PSNR	62.6779	63.1076	63.1311	64.8165	65.2971
	UIQI	0.7988	0.8432	0.8118	0.8776	0.8991
Washington DC mall	SAM	0.5609	0.5998	0.5956	0.5519	0.5171
	EARGAS	0.5741	0.5034	0.4993	0.4886	0.4850
	PSNR	64.09	64.12	64.19	65.11	65.1358
	UIQI	0.9199	0.9409	0.9213	0.9365	0.9656
Botswana	SAM	0.2541	0.2179	0.2233	0.2108	0.1908
	EARGAS	0.5194	0.4989	0.5034	0.4992	0.4698
	PSNR	63.1123	63.4321	63.9019	64.0116	64.8798
	UIQI	0.9703	0.9772	09715	0.9827	0.9960

Table 2.

The performance evaluation of different fused algorithms on four hyperspectral datasets.

The above work is extended by introducing different stacked convolution layers in the residual block of the ResNet. The experimental results obtained after stacked convolution layer in the ResNet are shown in Table 3. From the SAM value in Table 3, it is clear that the spectral information of the image is reducing as and when the number of stacked layers in the residual block increases. The UIQI value from the Table 3 also reveals that quality of the reconstructed image is also diminishing as the number stacked layer increases in the ResNet. The PSNR and EARGS show a stable performance, which ensure the spatial consistency of our proposed method. So, we concluded that ResNet Fusion network with two-stacked convolution layer acquires more discriminative features from the source images and guarantee the quality of the reconstructed image on analyzing the results obtained in Table 3.

Dataset	Methods	Number of stacked convolution layers
Dataset	Methods	2 layers	3 layers	4 layers
Pavia university	SAM	0.0409	0.065	0.069
	EARGAS	0.4029	0.4029	0.4029
	PSNR	66.1127	66.1127	66.1127
	UIQI	0.9872	0.9713	0.9622
Indian pines	SAM	0.3896	0.4186	0.4553
	EARGAS	0.6170	0.6170	0.6170
	PSNR	65.2971	65.2971	65.2971
	UIQI	0.8991	0.8904	0.8801
Washington DC mall	SAM	0.5171	0.5529	0.5721
	EARGAS	0.4850	0.4850	0.4850
	PSNR	65.1358	65.1358	65.1358
	UIQI	0.9656	0.9432	0.9209
Botswana	SAM	0.1908	0.1978	0.2085
	EARGAS	0.4698	0.4698	0.4698
	PSNR	64.8798	64.8798	64.8798
	UIQI	0.9960	0.9822	0.9589

Table 3.

The performance of ResNet fusion by varying the stacked layers.

Figure 1 shown below is the visual representation of the output provided by our proposed ResNet fusion method on four benchmark datasets against all other baseline methods. From the figure, it is evident that ResNet Fusion with two-stacked convolution layers produces better performance in most of the areas in the image (highlighted) of the four datasets (Figure 5).

Figure 5.
The ground truth and fused image of different methods using four benchmark datasets.

We further extend the Resnet fusion architecture to reduce the number of parameters to make our proposed method more efficient and effective to handle high dimensional data. For that, we used short skip, long skip, and dense skip connection to the ResNet architecture with two-stacked convolution layers. Table 4 gives the total number of network parameters required for this ResNet architecture in each skip connection. From Table 4, it is clear that ResNet architecture with dense skip connection provides very less network parameters compared to ResNet with short and long skip connections.

Architecture	Number of parameters
CNN	31,586,081million
ResNet with Short Skip	8,045,825 million
ResNet with Long Skip	390,529 million
ResNet with Dense Skip	19,393 million

Table 4.

The performance of different skip connection.

Time complexity
Comparing the performance and running time of all the proposed algorithms on four benchmark datasets are shown in Figure 6. From this figure, it is evident that ResNet fusion with dense skip connection took very less running time and showed good performance in reconstructing high-fidelity hyperspectral image. On comparing the performance and running time of ResNet with long skip and short skip connection, long skip connection ResNet fusion architecture shows good performance and running time than short skip connection. On evaluating the performance and running time of all ResNet fusion architectures, ResNet with dense skip connection outperformed compared to the other two ResNet fusion architectures. While comparing the performance and running time, the FCN-CNMF method showed better performance and time than CNN-based fusion. Finally, we concluded that, ResNet with dense skip connection with less network parameter shown highlighting performance for reconstructing good spatial and spectral quality HR-HSI compared to all other proposed methods. However, all our proposed methods show good in performance but the cost incurred in terms of time is high.
Resnet HS-MS fusion model
The experimental analysis of our ResNet fusion architecture with various parameters is done to build a general model for our proposed HS-MS ResNet fusion algorithm. For this purpose, we trained the network by using cropped HSI and MSI image pairs from each dataset. That means each dataset is cropped into several patches and then divided into training and testing data. In the case of Indian Pine dataset with size 610 × 340 × 103 are cropped into several patches of size M × N × L. The patch size was M× N ×L = 15 × 15 × 103 for Indian Pine dataset showing high performance to our network model. Similarly, we create training and testing samples for all three datasets. The patch size for Washington Dc Mall dataset was M × N × L = 19 × 19 × 191, for Botswana dataset, were M × N × L = 17 × 17 × 145 and for Pavia University dataset were M × N × L = 19 × 19 × 192 gives a network model with good running time and network parameters.
We measure the quality matrix value of our ResNet fusion by varying the number of stacked layers and found that residual blocks, each having two-stacked convolution layers is performing better than the others. The most significant part of ResNet is skipped connection, which helps for the flow of information through the network more efficiently and effectively. So, we also experimented with three skip connections: short skip, long skip, and dense skip connection. From this experiment, we found that ResNet with a dense skip connection reduces the number of network parameters to a large extent.
Finally, we built a generative ResNet model for the fusion of HS–MS image as shown in Table 5. The ResNet fusion model uses ID and 2D convolution networks. These two convolution networks consist of three residual blocks, each residual block contains two convolution layers with 64 filters, 3x3 kernel size, stride = 1, max-pooling, and padding = same. To make the information flow accurately throughout the network, we use dense skip connection. At last, it uses a 2D convolution to decode the reconstructed image into the original format.

Figure 6.
The running time of traditional and deep learning HS-MS image fusion.

Name	Layer	Kernel size		Input size	Input content	Stride	Padding	Activation	Output size	Output content
Input Layer	Conv 1	ID-CNN	1*3	1	1D Image(Spectral)	1	same	ReLU	64	1DConv1
Input Layer	Conv 1	2D-CNN	3*3	2	2D Image(Spatial)	1	same	ReLU	64	2DConv1
Residual Block1	Conv 2	ID-CNN	1*3	64	1DConv1	1	same	ReLU	64	1DConv2
	Conv 2	2D-CNN	3*3	64	2DConv1	1	same	ReLU	64	2DConv2
	Conv 3	ID-CNN	1*3	64	1DConv2	1	same	ReLU	64	1DConv3
	Conv 3	2D-CNN	3*3	64	2DConv2	1	same	ReLU	64	2DConv3
Skip Connection	Add 1		—	—	1DConv1 + 1DConv3	—	—	—	—	1DResB1
Skip Connection	Add 1		—	—	2DConv1+ 2DConv3	—	—	—	—	2DResB1
Residual Block2	Conv 4	ID-CNN	1*3	64	1DResB1	1	same	ReLU	64	1DConv4
	Conv 4	2D-CNN	3*3	64	2DResB1	1	same	ReLU	64	2DConv4
	Conv 5	ID-CNN	1*3	64	1DConv4	1	same	ReLU	64	1DConv5
	Conv 5	2D-CNN	3*3	64	2DConv4	1	same	ReLU	64	2DConv5
Skip Connection	Add 2		—	—	1DConv1 + 1DResB1 + 1DConv5	—	—	—	—	1DResB2
Skip Connection	Add 2		—	—	2DConvt1 + 2DResB1 + 2DConv5	—	—	—	—	2DResB2
Residual Block3	Conv 6	ID-CNN	1*3	64	1DResB2	1	same	ReLU	64	1DConv6
	Conv 6	2D-CNN	3*3	64	2DResB2	1	same	ReLU	64	2DConv6
	Conv 7	ID-CNN	1*3	64	1DConv6	1	same	ReLU	64	1DConv7
	Conv 7	2D-CNN	3*3	64	2DConv6	1	same	ReLU	64	2DConv7
Skip Connection	Add 3		—	—	1DConv1 + 1DResB1 + 1DResB2 + 1DConv7	—	—	—	—	1DResB3
Skip Connection	Add 3		—	—	2DConv1 + 2DResB1 + 2DResB2 + 2DConv7	—	—	—	—	2DResB3
Max pooling	Conv 8	ID-CNN	1*3	64	1DResB3	1	same	ReLU	32	1DConv8
Max pooling	Conv 8	2D-CNN	3*3	64	2DResB3	1	same	ReLU	32	2DConv8
Flatten layer	Conv 9	ID-CNN	1*1	32	1DConv8	1	same	ReLU	1	Spectral data
Flatten layer	Conv 9	2D-CNN	1*1	32	2DConv8	1	same	ReLU	1	Spatial data
Upsampling layer	Conv 10	2D-CNN	3*3	1	Spectral/Spatial data	1	same	ReLU	32	Spectral*Spatial
Output layer	Conv 11	2D-CNN	3*3	32	Spectral * Spatial	1	same	ReLU	64	Fused Image

Table 5.

ResNet-dense skip Architecture of HS-MS image fusion.

7. Conclusion

In this work, we implemented HS–MS fusion on deep learning method because of its strong ability to extract features from the image. At first, we implemented the HS–MS fusion process in conventional CNN method. But in CNN, each layer takes the output from the previous layer, which tends to lose information as the network goes into deeper architecture. So we further implemented the fusion process in ResNet by adding the skip connection between the convolution layers. This skip connection helps to extract more detailed features from the images without any degradation problems. Our constructed ResNet fusion architecture includes three-residual blocks, and each block is a combination of stacked convolution layer and skip connections. Moreover, we modify the ResNet fusion architecture with different stacked layers and found that ResNet with two-stacked layer gives more accurate results. Finally, we extend ResNet architecture to reduce the number of parameters by using different skip connections like short ship, long skip, and dense skip connections. From the experimental analysis, it is found that the ResNet- dense skip improve the performance in image reconstruction with very less network parameters and running time compared to other fusion methods. This deep residual network helps to extract nonlinearity features with the help of the ReLU activation layer. The experiment and performance analysis of our algorithm is done effectively and quantitatively on four benchmark datasets. The fusion results indicate that ResNet with dense skip fusion method shows outstanding performance over traditional and DL methods by keeping the spatial and spectral data to a large extent in the reconstructed image.

References

1. Michael NH, Kudenov W. Review of snapshot spectral imaging technologies. Optical Engineering. 2013;52(10):090901
2. Feng F, Zhao B, Tang L, Wang W, Jia S. Robust low-rank abundance matrix estimation for hyperspectral unmixing. IET International Radar Conference (IRC 2018). 2019;2019(21):6406-6409
3. Dhore AD, Veena CS. Evaluation of various pansharpening methods using image quality metrics. 2nd International Conference on Electronics and Communication Systems (ICECS). IEEE. 18 June 2015:2015. DOI: 10.1109/ecs.2015.7125039
4. Wang Z, Chen B, Ruiying L, Zhang H, Liu H, Varshney PK. FusionNet: “An unsupervised convolutional variational network for hyperspectral and multispectral image fusion”. IEEE Transactions on Image Processing. 2020;29:7565-7577
5. He K, Zhang X, Ren H, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern Recognition. IEEE. 12 December 2016:770-778
6. Loncan L, de Almeida LB, Bioucas-Dias JM, Briottet X, et al. Hyperspectral pansharpening: A review. In: IEEE Geoscience and Remote Sensing Magazine. IEEE; September 2015;3(3):27-46
7. Vivone G et al. A critical comparison among pansharpening algorithms. IEEE Transactions on Geoscience and Remote Sensing. 2015;53(5):2565-2586
8. Wei Q, Bioucas-Dias J, Dobigeon N, Tourneret JY. Hyperspectral and multispectral image fusion based on a sparse representation. IEEE Transactions on Geoscience and Remote Sensing. 2015;53:3658-3668
9. Patero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5:111-126
10. Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems. Denver. Cambridge, MA, United States: MIT press; 2001. pp. 556-562
11. Tong L, Zhou J, Qian B, Yu J, Xiao C. Adaptive graph regularized multilayer nonnegative matrix factorization for hyper-spectral unmixing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2020;13:434-447
12. Cao J et al. An endmember initialization scheme for nonnegative matrix factorization and its application in hyper-spectral unmixing. ISPRS International Journal of Geo-Information. 2018;7:195. DOI: 10.3390/ijgi7050195
13. José M, Nascimento P, Bioucas Dias JM. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing. 2005;43(4)
14. Yokoya N, Yairi T. Iwasaki, “A Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion”. IEEE Transactions on Geoscience and Remote Sensing. 2012;50:528-537
15. Simoes M, Bioucas-Dias J, Almeida L, Chanussot J. A convex formulation for hyperspectral image super resolution via subspace-based regularization. IEEE Transactions on Geoscience and Remote Sensing. 2015;53:3373-3388
16. Lin C-H, Ma F, Chi C-Y, Hsieh C-H. A convex optimization-based coupled nonnegative matrix factorization algorithm for hyperspectral and multispectral data fusion. IEEE Transactions on Geoscience and Remote Sensing. 2018;56(3):1652-1667. DOI: 10.1109/tgrs.2017.2746078
17. Yang F, Ma F, Ping Z, Guixian X. Total variation and signature-based regularizations on coupled nonnegative matrix factorization for data fusion. Digital Object Identifier. 2019;7:2695-2706. DOI: 10.1109/ACCESS.2018.2857943. IEEE Access
18. Yang F, Ping Z, Ma F, Wang Y. Fusion of hyperspectral and multispectral images with sparse and proximal regularization. IEEE Access Digital Object Identifier. 2019;2019:2961240. DOI: 10.1109/ACCESS
19. Palsson F, Sveinsson JR, Ulfarsson MO. Multispectral and hyperspectral image fusion using a 3-D convolutional neural network. IEEE Geoscience and Remote Sensing Letters. 2017;14:639-643
20. Masi G, Cozzolino D, Verdoliva L, Scarpa G. Pansharpening by convolutional neural networks. Remote Sensing. 2017;8(7):594
21. Shao Z, Cai J. Remote sensing image fusion with deep convolutional neural network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. May 2018;11(5):1656-1669
22. Yang J, Zhao Y-Q, Chan J. Hyperspectral and multispectral image fusion via deep two-branches convolutional neural network. Remote Sensing. 2019;10(5):800
23. Chen L, Wei Z, Xu Y. A lightweight spectral–spatial feature extraction and fusion network for hyperspectral image classification. Remote Sensing. 2020;12:1395. DOI: 10.3390/rs12091395. 28 April
24. Song W, Li S, Fang L, Lu T. Hyperspectral image classification with deep feature fusion network. IEEE Transactions on Geoscience and Remote Sensing. 2018;56(7):3173-3184
25. Available from: http://lesun.weebly.com/hyperspectral-data-set.html
26. Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. Available from: https://www.deeplearningbook.org/
27. Ma F, Yang F, Ping Z, Wang W. Joint spatial-spectral smoothing in a minimum-volume simplex for hyperspectral image super-resolution. Applied Sciences. 2019;10(1)
28. Available from: https://www.usgs.gov/landsat-missions/landsat-7
29. Hong D, Yokoya N, Chanussot J, Zhu X. An augmented linear mixing model to address spectral varialbilty for hyperspectral unmixing, geography, computer science. In: IEEE Transactions on Image Processing. 2018
30. Wang ACBZ. A universal image quality index. IEEE Signal Proessing Letters. 2002;9:81-84

[1] 1. Michael NH, Kudenov W. Review of snapshot spectral imaging technologies. Optical Engineering. 2013;52(10):090901

[2] 2. Feng F, Zhao B, Tang L, Wang W, Jia S. Robust low-rank abundance matrix estimation for hyperspectral unmixing. IET International Radar Conference (IRC 2018). 2019;2019(21):6406-6409

[3] 3. Dhore AD, Veena CS. Evaluation of various pansharpening methods using image quality metrics. 2nd International Conference on Electronics and Communication Systems (ICECS). IEEE. 18 June 2015:2015. DOI: 10.1109/ecs.2015.7125039

[4] 4. Wang Z, Chen B, Ruiying L, Zhang H, Liu H, Varshney PK. FusionNet: “An unsupervised convolutional variational network for hyperspectral and multispectral image fusion”. IEEE Transactions on Image Processing. 2020;29:7565-7577

[5] 5. He K, Zhang X, Ren H, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern Recognition. IEEE. 12 December 2016:770-778

[6] 6. Loncan L, de Almeida LB, Bioucas-Dias JM, Briottet X, et al. Hyperspectral pansharpening: A review. In: IEEE Geoscience and Remote Sensing Magazine. IEEE; September 2015;3(3):27-46

[7] 7. Vivone G et al. A critical comparison among pansharpening algorithms. IEEE Transactions on Geoscience and Remote Sensing. 2015;53(5):2565-2586

[8] 8. Wei Q, Bioucas-Dias J, Dobigeon N, Tourneret JY. Hyperspectral and multispectral image fusion based on a sparse representation. IEEE Transactions on Geoscience and Remote Sensing. 2015;53:3658-3668

[9] 9. Patero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5:111-126

[10] 10. Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems. Denver. Cambridge, MA, United States: MIT press; 2001. pp. 556-562

[11] 11. Tong L, Zhou J, Qian B, Yu J, Xiao C. Adaptive graph regularized multilayer nonnegative matrix factorization for hyper-spectral unmixing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2020;13:434-447

[12] 12. Cao J et al. An endmember initialization scheme for nonnegative matrix factorization and its application in hyper-spectral unmixing. ISPRS International Journal of Geo-Information. 2018;7:195. DOI: 10.3390/ijgi7050195

[13] 13. José M, Nascimento P, Bioucas Dias JM. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing. 2005;43(4)

[14] 14. Yokoya N, Yairi T. Iwasaki, “A Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion”. IEEE Transactions on Geoscience and Remote Sensing. 2012;50:528-537

[15] 15. Simoes M, Bioucas-Dias J, Almeida L, Chanussot J. A convex formulation for hyperspectral image super resolution via subspace-based regularization. IEEE Transactions on Geoscience and Remote Sensing. 2015;53:3373-3388

[16] 16. Lin C-H, Ma F, Chi C-Y, Hsieh C-H. A convex optimization-based coupled nonnegative matrix factorization algorithm for hyperspectral and multispectral data fusion. IEEE Transactions on Geoscience and Remote Sensing. 2018;56(3):1652-1667. DOI: 10.1109/tgrs.2017.2746078

[17] 17. Yang F, Ma F, Ping Z, Guixian X. Total variation and signature-based regularizations on coupled nonnegative matrix factorization for data fusion. Digital Object Identifier. 2019;7:2695-2706. DOI: 10.1109/ACCESS.2018.2857943. IEEE Access

[18] 18. Yang F, Ping Z, Ma F, Wang Y. Fusion of hyperspectral and multispectral images with sparse and proximal regularization. IEEE Access Digital Object Identifier. 2019;2019:2961240. DOI: 10.1109/ACCESS

[19] 19. Palsson F, Sveinsson JR, Ulfarsson MO. Multispectral and hyperspectral image fusion using a 3-D convolutional neural network. IEEE Geoscience and Remote Sensing Letters. 2017;14:639-643

[20] 20. Masi G, Cozzolino D, Verdoliva L, Scarpa G. Pansharpening by convolutional neural networks. Remote Sensing. 2017;8(7):594

[21] 21. Shao Z, Cai J. Remote sensing image fusion with deep convolutional neural network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. May 2018;11(5):1656-1669

[22] 22. Yang J, Zhao Y-Q, Chan J. Hyperspectral and multispectral image fusion via deep two-branches convolutional neural network. Remote Sensing. 2019;10(5):800

[23] 23. Chen L, Wei Z, Xu Y. A lightweight spectral–spatial feature extraction and fusion network for hyperspectral image classification. Remote Sensing. 2020;12:1395. DOI: 10.3390/rs12091395. 28 April

[24] 24. Song W, Li S, Fang L, Lu T. Hyperspectral image classification with deep feature fusion network. IEEE Transactions on Geoscience and Remote Sensing. 2018;56(7):3173-3184

[25] 25. Available from: http://lesun.weebly.com/hyperspectral-data-set.html

[26] 26. Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. Available from: https://www.deeplearningbook.org/

[27] 27. Ma F, Yang F, Ping Z, Wang W. Joint spatial-spectral smoothing in a minimum-volume simplex for hyperspectral image super-resolution. Applied Sciences. 2019;10(1)

[28] 28. Available from: https://www.usgs.gov/landsat-missions/landsat-7

[29] 29. Hong D, Yokoya N, Chanussot J, Zhu X. An augmented linear mixing model to address spectral varialbilty for hyperspectral unmixing, geography, computer science. In: IEEE Transactions on Image Processing. 2018

[30] 30. Wang ACBZ. A universal image quality index. IEEE Signal Proessing Letters. 2002;9:81-84

Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network - ResNet Fusion

Hyperspectral Imaging - A Perspective on Recent Advances and Applications

Abstract

Keywords

Author Information

K. Priya*

K.K. Rajkumar

1. Introduction

2. Review of literature

2.1 Traditional methods

2.2 Deep learning methods

3. Materials and methods

3.1 Dataset

3.2 Convolution neural networks

3.3 Residual network (ResNet)

Figure 1.

4. Problem formulation

5. Problem implementation

5.1 CNN fusion architecture

Table 1.

5.2 Resnet fusion architecture

Figure 2.

Figure 3.

Figure 4.

6. Results and discussion

Table 2.

Table 3.

Figure 5.

Table 4.

Figure 6.

Table 5.

7. Conclusion

References

Magnetic Scattering with Polarised Soft X-rays

Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network - ResNet Fusion

Hyperspectral Imaging - A Perspective on Recent Advances and Applications

Abstract

Keywords

Author Information

K. Priya*

K.K. Rajkumar

1. Introduction

2. Review of literature

2.1 Traditional methods

2.2 Deep learning methods

3. Materials and methods

3.1 Dataset

3.2 Convolution neural networks

3.3 Residual network (ResNet)

Figure 1.

4. Problem formulation

5. Problem implementation

5.1 CNN fusion architecture

Table 1.

5.2 Resnet fusion architecture

Figure 2.

Figure 3.

Figure 4.

6. Results and discussion

Table 2.

Table 3.

Figure 5.

Table 4.

Figure 6.

Table 5.

7. Conclusion

References

Continue reading from the same book

Hyperspectral Imaging