Feature similarities of images restored from various super-resolution methods.
For decades, super-resolution has been a widely applied technique to improve the spatial resolution of an image without hardware modification. Despite the advantages, super-resolution suffers from ill-posedness, a problem that makes the technique susceptible to multiple solutions. Therefore, scholars have proposed regularization approaches as attempts to address the challenge. The present work introduces a parameterized diffusion-steered regularization framework that integrates total variation (TV) and Perona-Malik (PM) smoothing functionals into the classical super-resolution model. The goal is to establish an automatic interplay between TV and PM regularizers such that only their critical useful properties are extracted to well pose the super-resolution problem, and hence, to generate reliable and appreciable results. Extensive analysis of the proposed resolution-enhancement model shows that it can respond well on different image regions. Experimental results provide further evidence that the proposed model outperforms.
Before deepening into the super-resolution imaging, let us discuss the term resolution. Most people, particularly those not in the imaging field, define resolution broadly as the physical size of an image. For a two-dimensional digital image, this definition implies an area in the image given as the product of the number of pixels in the horizontal and vertical dimensions (pixel or picture element is the smallest unit of information in a digital image). In this context, therefore, a high-resolution image contains a higher pixel count than a low-resolution image. Figure 1(a) includes features with higher perceptual qualities than those in Figure 1(b), but both images have equal sizes. From the figure, therefore, we see that dimension only seems inadequate to define the resolution of an image.
Resolution, more generally, means the quality of a scene (image or video). Five major types of image resolutions are known: pixel resolution, spectral resolution, temporal resolution, radiometric resolution, and spatial resolution. The use of these variations depends on the application. Pixel resolution refers to the total number of pixels a digital image contains. Hence, both images in Figure 1(a) and (b) possess equal pixel resolutions of 2179 × 2011. In other words, each image is approximately 4.4 megapixels (2179 × 2011 = 4,381,969 pixels ≈4.4 megapixels). Unfortunately, pixel count offers fraction of the pieces of information contained in the image. For a colored image with red, green, and blue channels, an individual pixel can only accommodate the details of a single color. Spectral resolution describes the ability of an imaging device to distinguish the frequency (or wavelength) components of an electromagnetic spectrum. Imagine spectral resolution as the degree in which you can uniquely discern two different colors or light sources. Temporal resolution refers to the rate at which an imaging device revisits the same location to acquire data. When dealing with videos, for example, the term implies an average time between consecutive video frames: a standard video camera can record 30 frames per second, implying that every 33 ms, this camera captures an image. In remote sensing, temporal time is usually measured in days to represent time that a satellite sensor revisits a specific location to collect data. Radiometric resolution defines the degree at which an imaging system can represent or distinguish intensity variations on the sensor. Expressed in number of bits (or number of levels), radiometric resolution provides the actual content of information in the image. Spatial resolution explains how an imaging modality can distinguish two objects. In practical situations, spatial resolution describes clarity of an image and defines the resolving power of an image-capturing device. The perceptual quality of an image increases with the spatial resolution. This research presents super-resolution imaging as one of the available techniques to enhance the spatial resolution of an image.
Most people are naturally inclined to high-quality and visually appealing images that contain adequate details. However, this demand is not always achieved because of some imperfections in the imaging process. Therefore, scholars have proposed hardware and software approaches to address the challenge. The former approach requires sensor modification, and it may be achieved by reducing the physical sizes of the pixels—a process that increases pixel density (number of pixels per unit area) on the surface of the sensor . The hardware approach gives perfect resolution enhancement, but the technique endures several drawbacks: (1) it introduces shot noise into the captured images, (2) it makes the imaging device costly and unnecessarily bulkier, and (3) it lowers the charge transfer rate because of the increased chip size . These challenges have prompted scholars to search for software techniques, which are cost-effective and reliable, to improve the spatial resolution of an image without effecting circuitry of the imaging device. In this case, an image can be captured by a low-cost device and processed to generate its corresponding high-quality version.
The classical software approach that has gained a considerable attention of scholars is called super-resolution [3–6], which uses signal processing principles to restore high-resolution images from at least one low-resolution image. Super-resolution techniques can be put into two major categories: single-frame-based, which generates a high-resolution image from the respective single low-resolution image [7, 8], and multi-frame-based, which exploits information from a sequence of degraded images to generate a high-quality image [2, 6]. The current work builds on the multi-frame super-resolution framework, which implicitly encourages noise reduction from the input low-resolution images. The framework bridges total variation (TV)  and Perona and Malik  smoothing functionals and allows for these functionals to interact in such a way that super-resolution and preservation of critical image features are simultaneously conducted.
2. Image degradation model
The multi-frame super-resolution framework can better be understood through a conceptual degradation model, which shows how an unknown high-resolution image, u, undergoes a variety of degradations to form M low-quality images, yk, with k = 1, …, M denoting positions of the low-resolution frames (Figure 2). In practice, the degradation process of u to generate yk involves warping, blurring, decimation (downsampling), and noising, respectively defined in this work by the operators Wk, Bk, Dk, and ηk: warping introduces rotations and translations into u, hence changing its geometrical properties; blurring reduces sharpness of features in u; decimation samples u and lowers its physical size; and noising corrupts u with noise, assumed to be additive.
Figure 2 can be transformed into
which explains how the degradation model generates frame k in a set of low-resolution images. The goal of the present study is to estimate u under the degradation conditions, and one approach to achieve the goal is to re-define Eq. (1) into the minimization problem that aims to lower ηk. Therefore, using the Lp norm, where p ∈ [1, 2] (the range 0 ≤ p < 1 is excluded because the values of p contained in this interval lead to nonconvex minimization problems that are susceptible to unstable solutions), the formulation to optimize u becomes
where E is modeled as an energy functional that defines noise level in the degraded image. The gradient of the cost of E in Eq. (2) is
where is the upsampling operator, and are the inverse operators for blurring and warping, respectively, and ⊙ denotes the Hadamard (element-wise) operator for two matrices. The solution of Eq. (2) can be obtained when Jp = 0.
For p = 1, Eq. (4) evaluates to
which shows that, after shifting and zero filling, copies values from the low-resolution to the high-resolution images, and WkBkDk reverses the operation . Pixel values are unaffected by these complimentary operations, implying that each entry in J1 is impacted by entries from all low-resolution images. Figure 3 shows the influences of D and that of DT on the reconstructing image. In their work, Farsiu et al. noted that the L1 minimization in Eq. (5) corresponds to the pixel-wise median, a robust estimator that addresses favorably noise and outliers in the input data. But the L1 norm is nondifferentiable at zero, a property that makes the minimization process unstable and that generates undesirable solutions.
For p = 2, Eq. (4) becomes a solution of the L2 norm minimization, or
which was proved in  that it represents pixel-wise mean of measurements. The L2 norm is less-robust against erroneous data, but the metric has better mathematical properties: convexity, differentiability, and stability. Therefore, several scholars prefer the L2 objective functions in situations where data contain low noise as in our case.
The super-resolution problem, whether formulated through L1 or L2 norm, has an ill-posedness nature. Given that r is the resolution factor, then for the under-determined case, or for M < r2, and for the square case, or for M = r2, the problem may evaluate to infinitely many undesirable solutions. Also, for the small amount of noise in the data, ill-posed problems tend to introduce larger perturbations in the final solutions. These issues can be effectively addressed through a technique called regularization, which has another advantage of speeding the convergence rate of the evolving solution. This work addresses the super-resolution ill-posedness through regularization functionals from nonlinear diffusion processes, which have been reported that they can preserve important image features (edges, contours, and lines) [13–15]. The proposed regularizer integrates total variation (TV)  and Perona and Malik (PM)  models that complement one another to generate appealing results.
3. Hybrid super-resolution model
3.1. Regularization functionals
Considering the super-resolution ill-posedness property, a hybrid framework combining TV and PM regularization kernels has been formulated. The framework includes additional parameters, α and β, which establish a proper balance between TV and PM during regularization. The objective is to de-emphasize weaknesses of the models and amplify their strengths so that the super-resolved images are superior.
In , Rudin et al. established the TV model that explains how noise in the image can be reduced. The model is based on the fact that a noisy image contains a higher total variation, defined by the integral of the absolute gradient of the image or
where ρ is the TV energy functional, Ω defines the domain under which u exists, and x denotes the two-dimensional spatial coordinate on Ω. Therefore, reducing noise is equivalent to minimizing ρ. Being defined in the bounded variation space, TV functionals allow for discontinuities in the image functions. Hence, regularization through TV promotes recovery of edges, which appear as “jumps” or discontinuous parts of the image, and effective noise removal. But studies have revealed that TV formulations favor piecewise-constant solutions, a consequence that generates staircase effects and introduces false edges . Also, TV regularization tends to lower contrast even in noise-free or flat image regions .
In the similar notion of the TV principle, Perona and Malik proposed an energy functional, ϕ, defined by
where K denotes the shape-defining constant, which can be minimized to suppress noise . Minimizing Eq. (8), which originates from robust statistics, produces a nonlinear diffusion equation that embeds a fractional conduction coefficient for preserving edges. The PM energy functional in Eq. (8) is nonconvex for |∇u| > K, an undesirable property that can generate instabilities in the evolving solution. This work presents a technique that retains the convex portion, |∇u| ≤ K, and complements the nonconvex portion of the PM potential by the TV energy functional.
The regularization process is often supported by the fidelity potentials
for additive noise, f = u + η, and
for multiplicative noise , f = uη, where f is the corrupted image and λ is the fidelity parameter that balances the trade-off between u and f. The fidelity term is often added to the regularization framework.
3.2. Proposed super-resolution model
The hybrid model can be derived from the minimization problem that integrates the corresponding energy functionals from super-resolution, TV, PM, and fidelity. Assuming additive noise and L2 estimator for the super-resolution part, the (regularized) minimization super-resolution problem parametrized in α and β becomes
where α, β ∈ [0, 1] and . Solving Eq. (11) using the Euler-Lagrange equation, and embedding the result into the time-dependent system gives
Eq. (12) offers both super-resolution image reconstruction and noise removal capabilities, dictated by TV and PM models. From the equation, as t → ∞ , u approaches an optimal solution—a stationary function that solves the energy functional, H, in Eq. (11). Eq. (12) has interesting properties for various parts of the image: in flat regions (|∇u| → 0), Eq. (12) reduces to
where C > 0 is a constant. This equation has a Laplacian term, Δu, which possesses isotropic diffusion characteristics to strongly and uniformly suppress noise in flat regions. In the neighborhood of the edges (|∇u| → ∞), Eq. (12) becomes
implying protection of edges against smoothing. This automatic interplay between reconstruction and regularization components helps to generate superior super-resolved images.
3.3. Numerical implementation
The solution of the proposed super-resolution model in Eq. (12) was iteratively estimated using the steepest descent method. Therefore, the evolution equation in Eq. (12) can be converted into a numerical system
where n denotes the iteration number that defines the solution space index of u, and τ > 0 denotes constant of the step size in the gradient direction. To encourage stability of the evolution equation in (15), the Courant-Friedrichs-Lewy condition, that is 0 < τ ≤ 0.25, should be satisfied . From the equation, the degradation matrices, namely Wk, Bk,and Dk, and their corresponding transpose versions may be regarded as direct operators for image manipulations: shifting, blurring, and downsampling, along with the reverse of these operations . With this observation of the matrices properties, implementation of the super-resolution component of Eq. (15) can be achieved using cascaded operators without explicitly constructing the operators as matrices. This implementation strategy helps to boost the algorithmic speed and to optimize hardware resources.
Eq. (15) can be represented in block form by Figure 4. From the Figure, each low-resolution frame, yk, is compared with the current estimate, un, of the high-resolution image. This process is undertaken by block Pk, detailed in Figure 5—an operator that represents the gradient back projection to compare the kth degraded frame and the high-resolution estimate at the nth iteration of the steepest descent method. Note from Figure 5 that T(PSF), with PSF denoting the point spread function, replaces with a simple convolution operator. This block can be implemented by flipping, on the respective axes, rows, and columns of the PSF in the up-down and left-right directions, respectively. Gradient of the regularization term is represented by block Q, defined more explicitly in Figure 6, which ensures that the evolution process converges and gives desirable solutions.
4. Experimental methodology
Several experiments were executed to determine performance of the proposed super-resolution model relative to the classical approaches. The methodology and procedures under which the experiments were undertaken can be explained as follows: firstly, high-resolution images of bike, butterfly, flower, hat, parrot, Parthenon, plant, and raccoon (Figure 7) were degraded to generate the corresponding low-resolution images (Figure 8, first column). Note that the original images were downloaded from the public domain with standard test images.1 These images were selected because they contain detailed features, and hence it would be easier to test the superiority of various super-resolution methods. As an example, the “Raccoon” image contains small-scale features (fine textures or fur) that most super-resolution approaches may find hard to restore. Degradation of the original images was achieved through warping, blurring, decimation, and noise addition to create sequences of 10 low-quality images with consecutive pairs differing by some rotation and translation motions. To void impacts of registration errors on the reconstruction process, the warping matrix was fixed. Thus, for 10 multiple low-resolution images, the warping matrix for the horizontal and vertical displacements, respectively denoted by ∆x and ∆y, was defined as follows:
Next, super-resolution methods based on a variety of regularizers, namely NC00 , TV , ANDIFF , and Hybrid, were applied on the degraded images to restore their original versions. Lastly, the objective metric, namely feature similarity (FSIM) , and the subjective metric were used to compare performances of different methods. FSIM incorporates into its formulation some aspects of the human visual system, and hence the metric is considered superior over several other existing image quality metrics. A visually appealing image has a higher value of FSIM, and vice versa.
5. Results and discussions
Visual results show that the classical methods tend to add undesirable artificial features into the reconstructed images (Figure 8). For instance, NC00 introduces bubble-like features around borders, edges, and corners, which are the critical features that emulate the human visual system. The method, on the other hand, does well on homogeneous image regions. The super-resolution method based on TV produces relatively sharper images, but the method also adds artifacts on homogeneous parts of the final images—an effect that degrades the visual quality of the images. The ANDIFF method generates smoother results that contain little artifacts, but the method underperforms for highly-textured images such as the Raccoon. The proposed hybrid model established a proper balance between smoothness and critical feature preservation (Figure 8, last column). Visually, the reconstructed images by our approach are more natural and are free from obvious artifacts. One may argue about a slight blurriness in our results. However, given the higher capability of the proposed method to preserve sensitive image features, this effect may be ignored. Also, the line graphs (taken near the last row across all columns) further confirm that the proposed method is superior because it generates a one-dimensional curve that closely matches the original one (Figure 9).
Numerical results demonstrate that, in all cases of the input images, the proposed super-resolution method achieves higher quality values (Table 1). These convincing objective observations can be explained well from the new formulation in Eq. (12): the hybrid super-resolution model captures the qualities of both PM and TV, an advantage that may promote higher objective quality results. Besides, our formulation incorporates parameters that give an effective interplay between the regularization functionals.
In this work, we have established a hybrid super-resolution framework that combines desirable features of TV and PM models. The framework has been parametrized to mask weaknesses of the models, introduce an automatic interplay between TV and PM regularizations, and promote appealing results. More emphasis was put on super-resolving low-quality images while retaining their naturalness and preserving their sensitive image features. Experimental results demonstrate that the proposed framework generates superior objective and subjective results.