Values of the quality indexes, related to the reduced resolution assessment procedure, using the Harlem dataset, for resolution enhancement ratio .
Multi-platform data introduce new possibilities in the context of data fusion, as they allow to exploit several remotely sensed images acquired by different combinations of sensors. This scenario is particularly interesting for the sharpening of hyperspectral (HS) images, due to the limited availability of high-resolution (HR) sensors mounted onboard of the same platform as that of the HS device. However, the differences in the acquisition geometry and the nonsimultaneity of this kind of observations introduce further difficulties whose effects have to be taken into account in the design of data fusion algorithms. In this study, we present the most widespread HS image sharpening techniques and assess their performances by testing them over real acquisitions taken by the Earth Observing-1 (EO-1) and the WorldView-3 (WV3) satellites. We also highlight the difficulties arising from the use of multi-platform data and, at the same time, the benefits achievable through this approach.
- hyperspectral image sharpening
- Hyperion data
- WorldView-3 images
- data fusion
- remote sensing
Hyperspectral (HS) data often provide great insights in the field of Earth Observing (EO) for the analysis and monitoring of the planet surface [1, 2]. As they embed a very detailed spectral information of the observed scene, their employment has become necessary in many applications, including natural vegetation classification and monitoring, geological map construction, chemical properties detection, land cover observation, and water resources management [1, 2]. The widespread use of hyperspectral data pushed toward the development of acquisition devices with increasing capabilities, the most recent of which are characterized by a ground spatial interval (GSI) even below 10 m .
However, this spatial resolution is still insufficient in many fields, as, for instance, geology , agriculture , and land cover classification . Data fusion techniques provide a possible solution to this issue that has been validated in several studies performed on both on real and simulated datasets [6, 7, 8]. In principle, high spatial resolution improvement factors can be attained for hyperspectral data, but the scarcity of exploitable companion high-resolution (HR) data represents a major issue. In fact, it is just possible to find very few examples of hyperspectral sensors co-located onboard of the same platform with high spatial resolution devices, such as panchromatic (PAN) and/or multispectral (MS) sensors. Since the Earth Observing-1, which mounted both a panchromatic and a multispectral camera onboard, is currently dismissed, the only remaining satellites to assure the availability of companion panchromatic sensors are the new Prisma and HypXIM, which are characterized by a six and four times higher spatial resolution with respect to the HS instrument, respectively.
The presence of a high-resolution sensor mounted on the same platform represents the ideal setting for the data fusion problem since the two images to combine are almost simultaneously acquired from the same point of view. However, in addition to the cited difficulty in finding platforms with this feature, the resolution ratio between the HS images and the companion high-resolution image is constrained to be very small, ranging from a value of 3 (EO-1 case) to 6 (Prisma case). Further resolution enhancement would require an additional upsampling procedure at one point in the algorithmic stack, thus strongly compromising the quality of the final fused product.
An alternative is constituted by the fusion of data acquired by multiple platforms, which, on the other hand, implies further difficulties related to the different observation geometry and the unavoidable lack of simultaneity between the acquisitions. Although this approach has been deeply investigated in the literature, the studies have almost always utilized simulated data [9, 10], thus ignoring the two cited issues that affect real data. A previous study based on real acquisitions was performed in  with temporally aligned images acquired by drones and aircrafts.
The current study focuses on multi-platform real data and aims at illustrating the state of the art of the, both classical and recent, low-level data fusion algorithms applicable to these data. Classical algorithms were adapted from the pansharpening literature, namely, from studies concerning the fusion of a panchromatic and a multispectral image . They can be straightforwardly applied to the HS/PAN fusion problem [9, 12, 13], but they require a preliminary assignation phase when the high-resolution image is constituted by a multispectral image . Indeed, a specific channel of the MS image has to be assigned to each hyperspectral band to complete the fusion process by means of classical techniques. The assignation algorithm (AA) significantly impacts the final results, and, for this reason, several algorithms have been proposed for completing this task [14, 15]. The latter fusion algorithms have been properly developed for the fusion of the HS and MS data and thus can be straightforwardly applied to the problem at hand. They include proper modifications of classical algorithms (hypersharpening) [16, 17] and applications of more general statistical approaches, as, for instance, the Bayesian framework , which is employed with naive [18, 19] and sparse Gaussian priors  and with alternative regularization terms [21, 22].
Three different datasets collected form the Earth Observing-1 and the WorldView-3 (WV3) satellites were employed in this study to evaluate the performance of the fusion algorithms. The tests were conducted according to the reduced resolution (RR) assessment procedure, based on Wald’s protocol . Specifically, the available HS image is employed as reference (or ground truth (GT)), and the images to fuse are constituted by properly degraded versions of the available data. This facilitates the use of accurate indexes for evaluating the quality of the final products thanks to the presence of a reference image. The availability of real data allowed to draw conclusions about the behavior of the different types of fusion algorithms and, in the case of classical pansharpening, about the assignation approaches.
The work is organized as follows. Section 2 describes the problem under consideration, including some details on the main fusion techniques employed in hyperspectral image sharpening. The conducted experimental analysis is detailed in Section 3, whereas the outcomes are reported in Section 4. Finally, conclusions are drawn in Section 5.
2. The hyperspectral sharpening framework
The data fusion procedure to sharp hyperspectral images consists in augmenting the spatial information contained in a low-resolution (LR) hyperspectral image, by injecting information from high-resolution data.
In the following, we will denote a generic acquisition composed by channels as a set of bidimensional matrices, as follows: . More in detail, the HS datacube will be denoted by , an MS acquisition by , and a PAN image by . The enhancement ratio, namely, the ratio between the spatial resolution of the original HS image and the desired spatial resolution, is indicated by . We restrict the analysis of the fusion problem to the combination of two images, i.e., the details to be injected are extracted by a single image.
2.1 Classical pansharpening approaches
Classical pansharpening algorithms are designed to operate with a monochromatic image, which acts as source to extract details to be injected into the LR image. Consequently, as long as the HR image is still monochromatic, the framework of classical pansharpening can be directly applied to this scenario, with the straightforward adjustment of using the HS image as the LR source image to fuse. Conversely, when the details are extracted from a multichannel image, the application of pansharpening approaches requires an assignment procedure between each HS band and a specific channel of the high-resolution MS image.
Data fusion through classical pansharpening approaches can be formalized by the following equation :
which represents the sharpening procedure of a generic -th channel of the HS image. In Eq. 1, the estimated HR hyperspectral image is indicated by , while denotes an upsampled (interpolated) version of the original image to match the scale of . The details, represented by the difference between the HR image and its low pass version , are additively injected in the latter image by properly weighting them through an element-by-element matrix product (indicated by the operator) by the injection coefficient matrix . It is worth to remind that both the details and the matrix in (1) are band-dependent, since some methods require a preliminary equalization of the HR image and is often optimized for each channel.
2.1.1 Component substitution and multi-resolution analysis algorithms
It is possible to specify different techniques of classical pansharpening methods according to the particular definition of the injection gain matrixand the method used for calculating the low-resolution image . In the literature, the key taxonomy for the macro-categorization is related to the techniques to , as two separate classes of methods arise with very distinguished properties. In particular, can be obtained either by properly combining the channels of or by spatially degrading the HR image . The first approach defines the so-called component substitution (CS), or spectral, methods, whose name is to underline that the fusion is obtained by substituting the HS intensity component with the HR image . This class includes both archetypical methods, such as the Brovey transform (BT) , the intensity-hue-saturation [27, 28], the principal component decomposition [29, 30, 31], or the Gram-Schmidt (GS) expansion , and more recent approaches, such as the Gram-Schmidt adaptive (GSA) method , which is able to achieve state-of-the-art performance .
The second class of approaches is known in the literature as multi-resolution analysis (MRA), or spatial, methods, since they operate directly in the spatial domain to obtain through a multi-scale decomposition. The MRA class includes a wide plethora of methods, which exploit a variety linear filters (box filters [34, 35], Gaussian filters , and à trous wavelet filters ) or nonlinear decompositions (morphological filters) .
The two classes have different characteristics, both in terms of visual aspect of sharpened images and in terms of robustness against nonideal working conditions. Specifically, methods belonging to the CS class usually yield final products featuring an accurate reproduction of the spatial details with an intrinsic robustness to limited spatial misalignments between the two images to fuse . Images produced by MRA approaches are instead characterized by a higher spectral coherence with the original LR image, possibly even reducing temporal misalignments among data to be combined .
2.1.2 Assignation algorithms
As seen in the previous section, in the case of HS/PAN fusion, the only possible choice for HR data in (1) is represented by the PAN image. Conversely, for the HS/MS fusion, any of the MS channels can act as HR data, demanding an assignation algorithm to couple a specific MS band with a given HS channel. This problem was addressed in previous papers by defining a series of criteria for selecting the most suitable MS channel [15, 41]. The possible approaches can be either data-independent, exclusively utilizing the characteristics of the sensors, or data-dependent, for which the assignment depends on the particular datasets. The analysis reported in  highlights the superior performances of the second approach but at the cost of requiring an additional computational effort to evaluate the new assignation for each new dataset.
Among the data-independent approaches, acceptable performance can be obtained by minimizing the distance between the centroid of the relative spectral response (RSR) of the sensor acquiring the channel and the centroids of the RSRs of the HR sensor. This method, nicknamed CEN-AA, assigns to the channel that verifies the condition:
defines the centroid of the generic relative spectral response (RSR) of a given channel .
For the AA step, the overall best results in terms of data fidelity of the reconstructed fused image are obtained by employing the algorithms proposed in [15, 41]. The first consists in maximizing the cross correlation (CC) between and the MS channels and is thus denoted in the following as CC-AA. Formally, it consists in coupling with the HR image such that:
where indicates the image obtained by degrading the resolution of by means of a filter matched to the modulation transfer function (MTF) of the -th MS channel and a downsampling by a factor ; represents the scalar product among the vectorized version of two generic channels and .
The alternative approach, defined in  and assessed in , aims at evaluating the spectral coherence of each available HR channel if it acts as a substitute of . In order to quantify this criterion, let us build the supporting images by substituting the bands at the place of , which are compared to the original image . Formally, is defined as:
where is obtained by equalizing the first two statistical moments of w.r.t. . The AA rule is defined by setting equal to the channel that satisfies the equation:
in which denotes the spectral angle mapper (SAM) between and . Accordingly, this approach is named SAM-AA by the authors.
2.2 Methods designed for hyperspectral image sharpening
Several different option have been recently developed ad hoc for the sharpening of HS data by using complementary images of different nature. A first option is to modify the existing pansharpening algorithms to account for the specific characteristics of the HS data. A different approach consists in developing a completely novel method by resorting to a suitable mathematical framework, as the widely exploited statistical Bayesian formalization.
A very effective method for sharpening HS images relies upon the construction of a simulated HR image assigned for each channel and obtained as a certain combination the available HR channels.
in which the weights are optimized through linear regression as described in . Equalizing the mean and variance of with respect to yields an improved version of hypersharpening, as proposed in .
The term in (1) is suggested to be obtained with the same strategies proposed by MRA methods, by degrading via an appropriate filter such as the MTF-matched generalized Laplacian pyramid. The fusion formula (1) is completed by defining the injection gain matrix that is derived through the regression-based model. Namely, is a constant matrix with entries:
where denotes the covariance operator.
2.2.2 Bayesian approaches
Most novel methods for sharpening HS images exploit the Bayesian statistical formalization of the fusion problem. In this approach, both the LR and HR available data are modeled as transformations, operating, respectively, in the spatial and in the spectral domains of an unknown ideal HR hyperspectral image denoted as .
Accordingly, the equation relating the target HR and the available LR image is written as:
where the lowercase letters denote the version of the matrices in lexicographic order (obtained by concatenating the columns of each channel), is the blurring matrix, is the downsampling matrix, and is the noise accounting for the unmodeled effects corrupting the relationship. (9) is coupled either to:
in the HS/PAN and HS/MS cases, respectively. They express the functional models relating to or and include the factors and that model the RSRs of the HR sensors and the noise addends and , accounting for the inaccuracy of the first terms.
The Bayesian approach based on the maximum a posteriori probability (MAP) consists in estimating the target vector through the formula:
in which we denote by the available HR image (or ). A reliable solution of (12) can be found by regularizing the problem by adding a penalization term to the quantity . Examples of widely employed regularization terms include Gaussian priors [43, 44] or vector total variation (VTV) .
3. Quality assessment of fusion products
In this section, we present the performance assessment setup procedure for sharpening the HS data. The objective is to test the viability of fusion algorithms to reach a resolution enhancement factor that goes beyond the limitations of single-platform setups. In the specific testbed, the HS data are constituted by acquisitions taken by the Hyperion sensor, which is characterized by a GSI of 30 m. The satellite platform also features the a PAN sensor, called ALI, whose GSI is 10 m, which corresponds to a nominal enhancement factor . For more ambitious factors, two extra scenarios are considered; in particular a very interesting comparison can be taken at and by analyzing different behaviors for single- and multi-platform with a selection of 12 state-of-the-art fusion algorithms. Specifically the single-platform case requires a preliminary interpolation of the ALI images, here performed via a convolution with a 45-tap interpolation kernel. The multi-platform case will employ, as companion source image, the MS imagery acquired by the WorldView-3, which instead have to be downsized to the target resolution, as it is characterized by a smaller GSI than the target one reached by all the considered . The decimation procedure is performed by employing a filter, mimicking the modulation transfer function of the MS sensor and a downsampling.
We want to remark here that this study will ignore the contribution of the ALI MS and WV3 PAN sensors. The former has the same GSI of the Hyperion sensor, making its information mostly redundant. Regarding the latter, we want to remark that the native GSI of the MS WV3 sensors already exceeds that of the target resolution characterized, for all under examination. Consequently, it is preferable to employ the MS sensor, as it is already characterized by a better spectral resolution, as shown in previous studies [14, 41].
3.1 Assessment procedure
The assessment procedure has been carried out at reduced resolution, namely, the original HS image is used as reference, and the images to fuse are obtained by degrading the available images by a factor equal to the resolution enhancement factor . The adopted Wald’s protocol  requires the reproduction of the characteristics of the fusion problem at a lower resolution. Accordingly, all the available images are degraded by using an MTF-shaped filter matched to the specific sensor and a downsampling system with factor .
The reduced resolution assessment protocol allows the use of many accurate quality indexes, since the ground truth image is available. In this work we consider the spectral angle mapper  for evaluating the spectral distortion and the erreur relative globale adimensionnelle de synthèse (ERGAS)  for assessing the radiometric distortion. The vectorial Q2-index  index is used for obtaining a comprehensive measure of the overall image quality. Finally, we employed the universal image quality index (UIQI) or Q-index, proposed by Wang and Bovik , for performing a band-by-band comparison of the final product with the reference image.
Three datasets are used for illustrating the capabilities of data fusion algorithms in producing very high-resolution hyperspectral images. The images have been acquired by the Earth Observing-1 and WorldView-3 sensors. The different settings allow to examine the features of the sharpening algorithms in the presence of the most common issues implied by multi-platform data fusion, namely, the different points of view and the temporal changes in the illuminated scenes between the two acquisitions. In this study we employ the visible near-infrared (VNIR) bands B09-B57, acquired by the sensor Hyperion, as HS data. The single-platform companion data are constituted by the PAN images collected by the ALI sensor, having a 10 m spatial resolution. All EO-1 data share a radiometric resolution of 15 bits. The multi-platform data have been acquired by the WV3 satellite. They are represented by an MS image composed of eight channels (coastal, blue, green, yellow, red, red edge, NIR1, and NIR2) with a radiometric resolution of 11 bits and an original spatial resolution of 1.2 m.
The employed datasets are briefly described below:
Harlem dataset: the images have been collected in New York, USA, in the neighborhoods of the Harlem River. The size of the Hyperion and PAN ALI data, acquired on July 21, 2016, is 144 × 144 pixels and 432 × 432 pixels, respectively, while the native dimension of the WV3 MS image, acquired on June 9, 2016, is 4320 × 4320 pixels.
Agnano dataset: the images refer to the area of the Agnano Racecourse, next to the city of Naples, Italy. The size of the Hyperion data is 144 × 72 pixels, and thus the corresponding ALI and WV3 images are composed by 432 × 216 pixels and 4320 × 1660 pixels, respectively. The acquisition dates of the EO-1 and WV3 sensors are May 20, 2015, and June 8, 2015, respectively.
Capodichino dataset: the images refer to the east surrounding Naples, Italy, around Capodichino Airport. The images are composed of the same number of pixels of the Agnano dataset and were acquired on May 20, 2015, and on February 4, 2002, by the EO-1 and WV3 satellites, respectively.
3.3 Fusion algorithms
We compare several fusion algorithms to fully assess the quality of HS products achievable through data acquired by a single or multiple platforms. We firstly focus on the use of classical pansharpening approaches, which constitute an almost ready-to-use solution and then present the purposely designed methods. Among the wide plethora of available pansharpening methods , we employed the following CS and MRA methods: Brovey transform , Gram-Schmidt spectral sharpening , and the Gram-Schmidt adaptive  belonging to the CS class, the additive wavelet luminance proportional (AWLP) , the generalized Laplacian pyramid  with MTF-matched filter  using both the high-pass modulation scheme  (GLP-HPM), and the regression-based injection model (GLP-CBD)  belonging to the MRA class.
Among the second group of approaches, we consider the hypersharpening (Hyper) method, developed in [16, 17], and four Bayesian techniques, namely, the coupled nonnegative matrix factorization (CNMF) , the naive Gaussian prior (Bay-N) , the sparsity promoted Gaussian prior (Bay-S) , and the hyperspectral superresolution (HySure) .
Finally, we report the results related to a method for upscaling the original image at the target scale by a simple interpolation of the original HS image. We denote as EXP this method that is carried out through a 45-tap interpolation filter and that constitutes also the baseline for more complex sharpening methods presented here.
4. Experimental results
The performance of the fusion algorithms are evaluated both by calculating the numerical values of the chosen quality indexes and by assessing the final products by visual inspection.
The first proposed dataset is about Harlem that has the purpose of illustrating the capabilities of producing a significant improvement of the spatial quality of the original HS images through the compared algorithms. To this aim, we report in Figure 1 the results related to all the tested enhancement factors (= 3,6,12), using one exemplary algorithm of each class. The RGB images are built by averaging a group of channels in the red, green, and blue frequency ranges (B29–B33, B17–B22, and B11–B15, respectively) to construct the required channels. Naturally, all the reference images (or ground truth) coincide, since they are represented by the original HS image; see Figure 1(a), (m), and (y). On the contrary, the simulated LR HS images, whose upsampled versions (EXP) are reported in Figure 1(g), (s), and (ae), get more and more degraded as the enhancement factor increases.
Some introductory considerations can be drawn from the images in Figure 1 obtained by using the SAM-AA for coupling the MS bands to the HS channels. In fact, the first remarkable result is the high quality of the final products achievable also at very high enhancement factors. More in detail, the images obtained by using classical GLP-CBD and the Hyper approach (which constitutes a generalization of the former approach, since both employ a regression-based injection scheme) produce the most appealing sharpened image. They are able to greatly enhance the spatial content of the original HS image, preserving an appreciable coherence of the colors.
On the other hand, the images achievable by using the ALI PAN have a satisfactory aspect only for =3, as it was arguable by the 10 m resolution of the employed HR sensor. The effect of the interpolation is clearly visible in Figure 1(n) and (z), and, thus, this approach could be preferable only when spatial or temporal misalignments among the multi-platform data cannot be avoided.
Those results are perfectly matched to the index values contained in Table 1. Actually, the numerical values point out that in most cases, the use of perfectly aligned images coming from a different satellite can produce images with superior quality also in the case of =3. In this case, the closer correspondence between the details extracted in the MS channel and the missing spatial information of the HS image can justify the outcome.
|R = 3||R = 6||R = 12|
Finally, we note that the comparison among the assignation algorithms mainly underlines that the two methods optimizing the assignation according to the specific dataset get almost the same results.
The other two scenes allow to gain more insight about the comparison of the sharpening algorithms and of the assignation algorithms. The EO-1 data have been extracted by the same images, while the multi-platform data have very different characteristics. In fact, while the WV3 image of the Agnano dataset has been acquired within a few days from the EO-1 data, the WV3 image of the Capodichino was collected more than 10 years earlier. Accordingly, the layout of the object present in the Capodichino scene is very different among the two passages, also because the area contains rapidly changing objects. A comparison of the two images can be achieved by having a look at Figure 2(a) and (b). The latter scene refers to Naples Airport and contains a plane on the runway that is not detectable in the corresponding WV3 MS image shown in Figure 3(d). Furthermore, different man-made objects are present in the illuminated area at the two acquisition times.
The results related to the Agnano dataset (see Table 2) confirm the conclusions drawn from the analysis of the Harlem dataset. They correspond to the most typical situation in which the images to fuse are ordered to a data provider, minimizing as much as possible the difference between the passage times of the two satellites. Accordingly, in both cases, the illuminated areas contain very similar features that make the multi-temporal data particularly valuable. However, both also represent almost ideal cases, since the presence of rapidly changing objects, for example, the aircrafts present in the Capodichino dataset, can vary also within very close passages. Accordingly, particularly interesting is the case of the Capodichino dataset, which gives rise to somewhat unalike results, which are reported in Table 3. In fact, in most cases, the single-platform setting almost always yields better results, even if the visual appearance of the images related to the multi-platform approach is often preferable in terms of quantity of injected details (see Figure 3). Actually, a more accurate analysis evidences that multi-platform products yield a sharpened image in which the plane is absent (especially in CS methods). Moreover, the spectral quality of the final products is significantly compromised if the WV3 MS images are used.
|R = 3||R = 6||R = 12|
|R = 3||R = 6||R = 12|
The spatial quality differences among the various algorithms can be further investigated by resorting to a quality index that allows a band-by-band analysis of the quality of the algorithms output. To this aim, we report in Figures 4 and 5 the behavior of the Q-index as a function of the HS band. The two images reveal both similarities and discrepancies in the algorithms’ performance. In particular, we can note that for the HS channels with support contained in the frequency range covered by the ALI PAN, the use of single-platform data is always preferable, except for the case of the Hyper algorithm applied to the Agnano dataset with = 6. Clearly, this consideration is all the more true in the experiment related to the Capodichino dataset. Instead, different trends are experienced for the near-infrared (NIR) bands. All the algorithms (except the GSA algorithm with = 3) obtain better performance by using multi-platform data working on the Agnano dataset. On the contrary, using the Capodichino dataset, the GSA algorithms always obtain superior results by using the ALI PAN image, while the other two methods obtain a slightly better performance in the NIR region that is not able to balance the scarce quality in the visible range, thus resulting in an inferior overall performance of the multi-platform approach. Finally, it is very clear from both Figures 4 and 5 that the CC-AA and the SAM-AA algorithms are able to obtain significant improvements with respect to CEN-CC, especially in the NIR frequencies.
The aim of this work was to illustrate the recent advances in the field of hyperspectral image sharpening through single-platform and multi-platform data. The study was conducted on real data acquired by the Earth Observing-1 and the WorldView-3 satellites in order to highlight the practical issues to be addressed when fusing images acquired by different platforms. We focused on well-known algorithms based on classical approaches borrowed from the pansharpening literature and on techniques developed on purpose. We evaluated the possibility of completing the fusion process, both in the absence and presence of temporal misalignments between the scenes illuminated by the sensors mounted on the two satellites. The study highlighted the suitability of the employment of multi-platform data especially in the presence of high-resolution enhancement factors. Actually, in some cases, the use of multispectral images was also proven to be useful at low-resolution enhancement factors, and this result can be easily justified by taking into consideration that the details contained in the MS channels are able to provide more specific spatial information for a given HS channel.