Open access peer-reviewed chapter

Application of Deep Learning Approaches for Enhancing Mastcam Images

By Ying Qu, Hairong Qi and Chiman Kwan

Submitted: January 7th 2020Reviewed: July 22nd 2020Published: November 4th 2020

DOI: 10.5772/intechopen.93446

Downloaded: 71

Abstract

There are two mast cameras (Mastcam) onboard the Mars rover Curiosity. Both Mastcams are multispectral imagers with nine bands in each. The right Mastcam has three times higher resolution than the left. In this chapter, we apply some recently developed deep neural network models to enhance the left Mastcam images with help from the right Mastcam images. Actual Mastcam images were used to demonstrate the performance of the proposed algorithms.

Keywords

  • Mastcam
  • Curiosity rover
  • image fusion
  • pansharpening
  • deep learning
  • Dirichlet-net
  • U-net
  • transition learning

1. Introduction

The Curiosity rover (Figure 1) has several instruments that are used to characterize the Mars surface. For example, the Alpha Particle X-Ray Spectrometer (APXS) [1] can analyze rock samples collected from the robotic arm and extract compositions of rocks; the Laser Induced Breakdown Spectroscopy (LIBS) [2] can extract spectral features from the vaporized fumes and deduce the rock compositions at a distance of 7 m; and the Mastcam imagers [3] can perform surface characterization from 1 km away.

Figure 1.

Curiosity rover and its onboard instruments [7].

The two Mastcam multispectral imagers are separated by 24.2 cm [3]. As shown in Figure 2, the left Mastcam (34 mm focal length) has three times the field of view of the right Mastcam (100 mm focal length). In other words, the right imager has three times higher resolution than that of the left. To generate stereo image or construct a 12-band image cube by fusing bands from the multispectral imagers from the left and right Mastcams [4, 5, 6], a practical solution is to downsample the resolution of the right images to that of the left images, which would avoid the artifacts caused by Bayer pattern [7] or the JPEG compression loss [8]. Although this approach has practical merits, it may restrict the potential ability of Mastcams. First, downsampling the right images will throw away those high spatial resolution pixels in the right bands. Second, the lower resolution of the current stereo images may degrade the augmented reality or virtual reality experience of users. If one can apply some advanced pansharpening algorithms to the left bands, then one can have 12 bands of high-resolution image cube for the purpose of stereo vision and image fusion.

Figure 2.

The two Mastcam imagers [9]. (a) Left Mastcam (b) Right Mastcam.

In the past two decades, there have been many papers discussing the fusion of a high resolution panchromatic (pan) image with a low-resolution multispectral image (MSI) [10, 11, 12, 13, 14]. This is known as pansharpening. In our recent papers [1516], we proposed an unsupervised network structure to address the image fusion/super-resolution (SR) problem for hyperspectral image (HSI), referred to as HSI-SR, where a low-resolution (LR) HSI with high spectral resolution and a high-resolution (HR) MSI with low spectral resolution are fused to generate an HSI with high-resolution in both spatial and spectral dimensions. Similar to MSI, HSI has found extensive applications [17, 18, 19, 20, 21]. In this chapter, we adopt the innovative approaches designed in [15, 16], referred to as unsupervised sparse Dirichlet Network (uSDN), to enhance Mastcam images, where we treat the right Mastcam image as MSI with higher spatial resolution and the left Mastcam image as HSI with low spatial resolution.

In this chapter, we focus on the application of uSDN to enhance Mastcam images. In Section 2, we first introduce the problem of HSI-SR and then briefly summarize the key ideas of uSDN. In Section 3, we apply uSDN on actual Mastcam images. In Section 4, we include some further enhancements of uSDN and experiments. In Section 5, we introduce a transition learning concept, which is a natural extension of uSDN. Some preliminary results are also included. Finally, we conclude the chapter with some remarks.

2. The uSDN algorithm for HSI-SR

In this section, we describe the uSDN algorithm developed in [15, 16]. For more details, please refer to the reference. First of all, we will formulate the problem of HSI-SR to facilitate the discussion of Mastcam enhancement. Table 1 summarizes the mathematical symbols used in this chapter.

HSIHyperspectral image
MSIMultispectral image
HSI-SRHSI super-resolution
HRHigh-resolution
LRLow-resolution
Yh¯/Yh3D/2D LR HSI
Ym¯/Ym3D/2D HR MSI
X¯/X3D/2D Reconstructed HR MSI
ΦhSpectral bases of HSI
ΦmSpectral bases of MSI
ShCoefficients/Representations of HSI
SmCoefficients/Representations of MSI
RTransformation matrix
ŶhReconstructed 2D HSI
W,bNetwork weights and bias
Emθhe/EmθmeEncoder of the HIS/MSI
DhθhdDecoder of the HIS and MSI
θhe/θmeEncoder weights of HIS/MSI
θhdDecoder weights of HSI and MSI
sRepresentations vector of a single pixel
v,u,βStick-breaking parameters
HpsEntropy function
AS˜hSmAngular difference

Table 1.

Symbols and abbreviations.

The basic idea of uSDN is illustrated in Figure 3. First, the LR HSI, Yh¯Rm×n×Lwith its width, height, and number of spectral bands denoted as m,n,and L,respectively, is unfolded into a 2D matrix, YhRmn×L. Similarly, the HR MSI, Ym¯RM×N×lwith its width, height, and number of spectral bands denoted as M,N,and l, respectively, is unfolded into a 2D matrix YmRMN×l. And the SR HSI, X¯RM×N×L,is unfolded into a 2D matrix XRMN×L. Note that, generally, the spatial resolution of the MSI is much higher than that of the HSI, that is, Mm,Nn,and the spectral resolution of HSI is much higher than that of the MSI, that is, Ll. The objective is to reconstruct the high spatial and spectral resolution HSI, X¯RM×N×L, with LR HSI and HR MSI.

Figure 3.

General procedure of HSI-SR [15].

Due to the limitation of hardware, each pixel in an HSI or MSI may cover more than one constituent materials, leading to mixed pixels. These mixtures can be assumed to be a linear combination of a few basis vectors (or source signatures). Both LR HSI Yhand HR MSI Ymcan be assumed to be a linear combination of cbasis vectors with their corresponding proportional coefficients (referred to as representations in deep learning), as expressed in Eqs. (1) and (2), where ΦhRc×Land ΦmRc×ldenote the spectral basis of Yhand Ym, respectively. They preserve the spectral information of the images. ShRmn×cand SmRMN×care the proportional coefficients of Yhand Ym, respectively. Since the coefficients indicate how much each spectral basis has in constructing the mixed pixel at specific spatial locations, they preserve the spatial structure of HSI. The relationship between HSI and MSI bases can be expressed in the right part of Eq. (2), where RRL×lis the transformation matrix given as a prior from the sensor [22, 23, 24, 25, 26, 27, 28, 29].

Yh=ShΦh,E1
Ym=SmΦm,Φm=ΦhR,E2
X=SmΦh.E3

With ΦhRc×Lcarrying the high spectral information and SmRMN×ccarrying the high spatial information, the desired HR HSI, X,is generated by Eq. (3). See Figure 3. Since the ground truth Xis not available, the problem has to be solved in an unsupervised fashion. In addition, the linear combination assumption enforces the representation vectors of HSI or MSI to be non-negative and sum-to-one, that is, j=1csij=1,where siis the row vector of either SmorSh[24, 29].

The uSDN unsupervised architecture is shown in Figure 4. It has three unique structures. First, the network consists of two encoder-decoder networks, to extract the representations of the LR HSI and HR MSI, respectively. The two networks share the same decoder, such that both the spectral and spatial information from multi-modalities can be extracted with unsupervised settings. Second, the representations of both modalities, Shand Sm, are enforced to follow a Dirichlet distribution where the sum-to-one and non-negative properties are naturally incorporated into the network [30, 31, 32, 33, 34]. The solution space is further regularized with a sparsity constraint. Third, the angular difference of the representations from two modalities is minimized to preserve the spectral information of the reconstructed HR HSI.

Figure 4.

Simplified architecture of uSDN [15].

3. Mastcam image enhancement using uSDN with improvements

3.1 Applying uSDN for Mastcam enhancement

uSDN has been thoroughly evaluated with two widely used benchmark datasets, CAVE [35] and Harvard [36]. Details can be found in [15, 16]. Here, we adopt uDSN to enhance the resolution of Mastcam images. As mentioned earlier, the right Mastcam has high resolution than the left. Hence, we treat the right Mastcam images are HR MSI and the left images as LR HSI. Although uSDN was introduced to deal with the general HSI super-resolution problem, we can treat the Mastcam image enhancement simply as a special case of HSI-SR.

For quantitative comparison, the root mean squared error (RMSE) and spectral angle mapper (SAM) are applied to evaluate the reconstruction error and the amount of spectral distortion, respectively.

The results are shown in Figure 5. The reconstructed image is very close to the ground truth. Most methods require that the size of high-resolution image should be equal to an integer multiplication of the size of low-resolution image. Thus, we only compare the method with CNMF [29] which works for arbitrary image size. The results are shown in Table 2. We observe that uSDN is able to outperform the CNMF.

Figure 5.

Results of Mastcam image enhancement using uSDN. The left column shows the six bands from the left camera. The middle column shows the corresponding reconstructed results. The right column shows the six bands from the right camera.

ApproachesRMSESAM
CNMF0.0562.48
uSDN0.0332.09

Table 2.

Evaluations for image enhancement from Mastcam.

3.2 Improvement based on uSDN

In this section, we summarize some further improvement of uDSN by fine-tuning the existing network structure in uSDN in order to further enhance the fusion performance.

The existing structure of uDSN described in Section 3.1 is improved in two ways. First, in Section 3.1, the architecture consists of two deep networks, for the representation learning of the LR HSI and HR MSI, respectively. And only the decoders of the LR HSI and HR HSI networks are shared. The spectral information (i.e., the decoder of the LR HSI network) is extracted through the LR HSI network. Then the representation layer of the HR HSI is optimized by enforcing the spectral angle similarity. However, this introduces additional cost function, that is, angular difference minimization, and the optimization procedure is time consuming. In the improved uDSN, for the HR HSI network, most of the encoder weights are shared with the weights of the LR HSI encoder. Only a couple of encoder weights are updated during the HR HSI optimization. In this way, both the representations of the LR HSI and HR HSI networks are reinforced to follow Dirichlet distributions with parameters following the same trends. And the representations extracted from the LR HSI matches the patterns of that extracted from the HR HSI as shown in Figure 6.

Figure 6.

Representations extracted from the LR HSI (top row) and the HR HSI (bottom row).

Second, to further reduce the spectral distortion of the estimated HR HSI, instead of using l2loss, we adopt the l21loss, which encourages the network to reduce the spectral loss of each pixel. Compared to the network with l2loss, the network with l21loss is able to extract spectral information of images more accurately. The l21loss can not only reduce the spectral distortion of the estimated HR HSI, but also improve the convergence speed of the network.

The result of the proposed method on individual HSI is visualized in Figure 7. When we optimize the network with l21loss, we can observe that the difference between the estimated MSI and the ground truth MSI is very small, with RMSE of 1.7428 and SAM of 0.25615.

Figure 7.

The results using improved uSDN. The left column shows the first two bands from the left camera. The second column shows the corresponding reconstructed images from the improved uSDN. The third column shows the reference images from the right camera. The right column shows the absolute difference between the reconstructed images and the reference images.

4. Combination of Dirichlet-Net and U-Net

In this section, we propose to combine Dirichlet-Net with U-Net [37] to mitigate the mis-registration issue in the left and right Mastcam images.

Since in real scenarios, the images from the left and right cameras may not match each other perfectly even after registration, we propose a combination of Dirichlet-Net and U-Net to further improve the fusion performance using non-perfectly registered patches. We propose an unsupervised architecture as shown in Figure 8, which consists of two deep networks, an improved Dirichlet-Net for the representation learning of the MSI, and a U-Net for switching the low-resolution spatial information patches with high-resolution spatial information patches. Then the HR MSI of the left Mastcam image is generated by combining its spectral information with the spatial information of improved resolution.

Figure 8.

The architecture of the proposed approach that combines Dirichlet-net with U-Net.

From the last step in Figure 8, we are able to extract both the spectral and spatial information from LR MSI (left Mastcam) and HR MSI (right Mastcam). Although the scenes from the left and right camera are not the same, we assume they share the same group of spectral bases. And if we could improve the spatial information of the LR MSI using HR MSI, the quality of the LR MSI can be enhanced.

The architecture of the U-Net is illustrated in the lower part of Figure 8. We first learn a U-Net to recover the extracted spatial information, Sm,of HR MSI, Ym,by convolution and deconvolution layers. The convolution layers extract HR spatial features from Sm,and the de-convolutional layers take these extracted features to rebuild the spatial information of Sm. Then we extract features from the spatial patches Shof the LR HSI Yhwith the same convolution layers and switch these feature patches with their most similar feature patches in the HR spatial features [38]. Finally, the left Mastcam image with enhanced resolution, X, is generated by feeding the switching patches into de-convolutional layers of U-Net and the decoder of the Dirichlet-Net.

Here, we show experimental results from the proposed combination (Dirichlet-Net and U-Net) approach in Figures 9 and 10. We can observe that the reconstructed left Mastcam image is sharper than the raw MSI captured from the left camera directly and the spectral distortion of the recovered MSI is small, although only part of the high resolution MSI (right Mastcam image) is given from the right camera. Note that, due to the memory constraint, only a small patch can be recovered every time, thus there exist some disconnected parts in the results. This issue will be addressed in Section 5.

Figure 9.

The results of test image MSL_0002_0114_M1. The top row shows the six bands of raw images from the left camera. The bottom row shows the corresponding reconstructed images from the proposed method.

Figure 10.

The cropped results of test image MSL_0002_0114_M1. The top row shows the six bands of raw images from the left camera. The bottom row shows the corresponding reconstructed images from the proposed method.

5. Spatial representation improvement with transition learning

High spatial resolution images have one natural property, that is, the transitions among pixel values are smooth. The patch-based method aims to replace the LR patches from the LR MSI representations Shwith the most similar HR patches from the HR MSI representation, Sm. Since the LR MSI and HR MSI are unregistered and there is no ground truth of enhanced MSI, the patch-based improvement could not guarantee the smooth transitions in the reconstructed images, that is, the replaced patches may not match their neighbors. Therefore, in this section, we propose another structure based on transition-learning, to further improve the spatial resolution of LR HSI. The main structure is shown in Figure 11.

Figure 11.

The architecture of the proposed transition learning approach.

To learn smooth transitions between pixels, we first extract sub-images from the representations Sm,of HR MSI with stride 3, as shown in the lower part of Figure 11. For example, since the super-resolution factor is 3, we extract 9 sub-images from Sm. Then the network learns the transitions between the center sub-image with the other 8 sub-images. Since the LR MSI and HR MSI have similar statistic distributions, we assume that the transitions among pixels in both modalities are the same. Therefore, the representations Shof LR MSI can be treated as the center sub-image of enhanced MSI and the other 8 sub-images of enhanced MSI can be estimated by feeding the representations Shof LR MSI into the network trained by Sm.There are still residuals between the reconstructed and the ideal representations of Sm.This time, we adopt the principle described earlier to add high frequency residuals on the enhanced MSI.

Here, the experimental results of the proposed approaches are compared with the results from Bicubic and the state-of-the-art single image super-resolution method EnhanceNet [39], as shown in Figures 1214. Note that, since the EnhanceNet only offers the 4X pre-trained weights, we show its 4X reconstruction results for fair comparison, in case the down-sampling procedure reduces the quality of the reconstructed images. The Bicubic does not improve the resolution much. The EnhanceNet was trained on natural image dataset; thus it works poorly on remote sensing images. Compared to the bicubic or EnhanceNet methods, we can observe that the proposed methods can not only improve the spatial resolution of LR MSI, but also preserve the spectral information well, even though the images from the left and right camera are not registered. The transition-based approach works better than the patch-based one, because it learns the relationship between the reconstructed pixels.

Figure 12.

The results of the test image MSL_0002_0114_M1. The left column shows the six bands of raw images from the left camera. The second, third, fourth and fifth columns show the corresponding reconstructed images from Bicubic, EnhanceNet, the proposed patch-based method and the residual-based transition-learning method, respectively.

Figure 13.

The cropped results of the test image MSL_0002_0114_M1. The left column shows the six bands of raw images from the left camera. The second, third, fourth, and fifth columns show the corresponding reconstructed images from Bicubic, EnhanceNet, the proposed patch-based method and the residual-based transition-learning method, respectively.

Figure 14.

The cropped results of the test image MSL_0002_0114_M1. The left column shows the six bands of raw images from the left camera. The second, third, fourth, and fifth columns show the corresponding reconstructed images from Bicubic, EnhanceNet, the proposed patch-based method and the residual-based transition-learning method, respectively.

6. Conclusions

In this chapter, we summarize the application of several deep learning-based image fusion algorithms to enhance Mastcam images from Mars rover. The first algorithm termed as uDSN is based on the Dirichlet-Net, which incorporates the sum-to-one and sparsity constraints. Two improvements of the uDSN were then investigated. Finally, a transition learning-based approach was developed. Promising results using actual Mastcam images are presented. More research will be carried out in the future to continue the above investigations.

Acknowledgments

This work was supported in part by NASA NNX12CB05C and NNX16CP38P.

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License, which permits use, distribution and reproduction for non-commercial purposes, provided the original is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ying Qu, Hairong Qi and Chiman Kwan (November 4th 2020). Application of Deep Learning Approaches for Enhancing Mastcam Images, Recent Advances in Image Restoration with Applications to Real World Problems, Chiman Kwan, IntechOpen, DOI: 10.5772/intechopen.93446. Available from:

chapter statistics

71total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Recent Advances in Image Restoration with Applications to Real World Problems

Edited by Chiman Kwan

Next chapter

Generative Adversarial Networks for Visible to Infrared Video Conversion

By Mohammad Shahab Uddin and Jiang Li

Related Book

First chapter

A Survey of Image Segmentation by the Classical Method and Resonance Algorithm

By Fengzhi Dai, Masanori Sugisaka and Baolong Zhang

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us